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METHOD AND APPARATUS FOR MEDIA DATA TRANSMISSION 

FIELD OF THE INVENTION 

The present invention relates to methods and apparatuses for preparing 
time related sequences of media data for transmission, and more particularly to 
packetized transmission of such media data. 



INTRODUCTI ON AND BACKGROUND 

There are various different file structures used today to store time-based 
media: audio formats such as AIFF, video formats such as AVI, and streaming 
formats such as RealMedia. One reason that such file structures are different is 
their different focus and applicability. Some of these formats are sufficiently 
relatively widely accepted, broad in their application, and somewhat simple to 
implement, and thus, may be used not only for content delivery but also as 
interchange formats. Foremost among these general formats is the QuickTime file 
format. It is used today in the majority of web sites serving time-based data; in the 
majority of authoring environments, including professional ones; and on the 
majority of multimedia CDROM titles. 

The QuickTime media layer supports the efficient display and management 
of general multimedia data, with an emphasis on time-based material (video, 
audio, etc.). The media layer uses the QuickTime file format as the storage and 
interchange format for media information. The architectural capabilities of the 
layer are generally broader than the existing implementations, and the file format is 
capable of representing more information than is currently demanded by the 
existing QuickTime implementations. 
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In contrast to formats such as AVI, which were generally designed to 
support local random access of synchronized media, QuickTime allows systems to 
manage the data, relationships and timing of a general multimedia presentation. In 
particular, the QuickTime file format has structures to represent the temporal 
behavior of general time-based streams, a concept which covers the time-based 
emission of network packets, as well as the time-based local presentation of 
multimedia data. 

The existing QuickTime file format is publicly described by Apple 
Computer in the May 1996 File format specification, which may be found at the 
QuickTime Web site, <http://. www.apple.com/quicktime>. 

One aspect of the QuickTime file format is the concept that the physical 
structure of media data (the layout in disk records) is independent of, and 
described by, a logical structure for the file. The file is fiilly described by a set of 
"movie" meta-data. This meta-data provides declarative, structural and temporal 
information about the actual media data. 

The media data may be in the same file as the description data, (the 
"movie" meta-data), or in other file(s). A movie structured into one file is 
commonly called "flat", and is self-contained. Non-flat movies can be structured 
to reference some, or all, of the media data in other files. 

As such, the format is generally suited for optimization in different 
applications. For example, when editing (compositing), data need not be rewritten 
as edits are applied and media is re-ordered; the meta-data file may be extended and 
temporal mapping information adjusted. When edits are complete, the relevant 
media data and meta-data may be rewritten into a single, interleaved, and optimized 
file for local or network access. Both the structured and the optimized files are 
valid QuickTime files, and both may be inspected, played, and reworked. 



0> 
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The use of structured ("non-flat") files enables the same basic media data to 
be used and re-used in any number of presentations. This same advantage applies 
when serving, as will be seen below. 

In both editing and serving, this also permits a number of other files to be 
treated as part of a movie without copying the media data. Thus editing and 
serving may be done directly from files such as Sun Microsystem's "au" audio 
format or the AVI video format, greatly extending the utility of these formats. 

The QuickTime file is divided into a set of objects, called atoms. Each 

object starts with an atom header, which declares its size and type: 

class Atom { 

int{32) size; 
char type [4] ; 

byte contents [ ] ; 

} 

The size is in bytes, including the size and type header fields. The type 
field is four characters (usually printable), to permit easy documentation and 
identification. The data in an object after the type field may be fields, a sequence 
of contained objects, or both. 

A file therefore is simply a sequence of objects: 

class File { 
Atom [ ] ; 

} 

The two important top-level objects are the media-data (mdat) and the meta- 
data (moov). 

The media-data object(s) contain the actual media (for example, sequences 
of sound samples). Their format is not constrained by the file format; they are not 
usually objects. Their format is described in the meta-data, not by any declarations 
physically contiguous with them. So, for example, in a movie consisting solely of 
motion-JPEG, JPEG frames are stored contiguously in the media data with no 
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intervening extra headers. The media data within the media data objects is 
logically divided into chunks; however, there are no explicit chunk markers within 
the media data. 

When the QuickTime file references media data in other files, it is not 
required that these 'secondary' files be formatted according to the QuickTime 
specification, since such media data files may be formatted as if they were the 
contents of a media object. Since the QuickTime format does not necessarily 
require any headers or other information physically contiguous with the media 
data, it is possible for the media data to be files which contain 'foreign' headers 
(e.g. UNIX ".au" files, or AVI files) and for the QuickTime meta-data to contain 
the appropriate declarative information and reference the media data in the 'foreign* 
file. In this way the QuickTime file format can be used to update, without 
copying, existing bodies of material in disparate formats. The QuickTime file 
format is both an established format and is able to work with, include, and thereby 
bring forward, other established formats. 

Free space (e.g. deleted by an editing operation) can also be described by 
an object. Software reading a file that includes free space objects should ignore 
such free space objects, as well as objects at any level which it does not 
understand. This permits extension of the file at virtually any level by introducing 
new objects. 

The primary meta-data is the movie object. A QuickTime file has exactly 
one movie object which is typically at the beginning or end of the file, to permit its 
easy location: 

class Movie { 

int(32) size; 

char type [ 4 ] = ' moov ' ; 

MovieHeader mh; 

contents Atom [ ] ; 

} 
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The movie header provides basic information about the overall presentation 
(its creation date, overall timescale, and so on). In the sequence of contained 
objects there is typically at least one track, which describes temporally presented 
data. 

class Track { 

int(32) size; 

char type [ 4 ] = ' trak ' ; 

TrackHeader th; 

contents Atom [ ] ; 

} 

The track header provides relatively basic information about the track (its 
ID, timescale, and so on). Objects contained in the track might be references to 
other tracks (e.g. for complex compositing), or edit lists. In this sequence of 
contained objects there may be a media object, which describes the media which is 
presented when the track is played. 

The media object contains declarations relating to the presentation required 

by the track (e.g. that it is sampled audio, or MIDI, or orientation information for a 

3Dscene). The type of track is declared by its handler: 
class handler { 



mhlr for media handlers 

-- vide for video, soun 



int (32) 


size; 


char 


type [4] = 'hdlr'; 


int (8) 


version; 


bit (24) 


flags ; 


char 


handler type [4] ; 


char 


handler subtype [ 4 ] 


audio 




char 


manufacturer [4] ; 


bit(32) 


handlerf lags ; 


bit (32) 


handler f lagsmask ; 


string 


component name ; 



} 

Within the media information there is likewise a handler declaration for the 
data handler (which fetches media data), and a data information declaration, which 
defines which files contain the media data for the associated track. By using this 
declaration, movies may be built which span several files. 



[* 
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At the lowest level, a sample table is used which relates the temporal aspect 

of the track to the data stored in the file: 

class sampletable { 

int(32) size; 
char type [43 = * s tbl ' ; 

sampledescription sd; 
timetosample tts; 
sync sample table syncs; 
sampletochunk stoc ; 

samplesize ssize; 
chunkof f set cof f set ; 

shadowsync ssync; 

} 

The sample description contains information about the media (e.g. the 
compression formats used in video). The time-to-sample table relates time in the 
track, to the sample (by index) which should be displayed at that time. The sync 
sample table declares which of these are sync (key) samples not dependent on 
other samples. 

The sample-to-chunk object declares how to find the media data for a given 

sample, and its description given its index: 

class sampletochunk { 

int(32) size; 

char type [4] = ' stsc ' ; 

int(8) version; 

bits (24) flags; 

int (32) entrycount ; 

for (int i=0; i<entrycount ; i++) { 

int (32) firstchunk; 

int (32 ) samplesperchunk; 

int (32) sampledescriptionindex; 

} 

} 

The sample size table indicates the size of each sample. The chunkoffset 
table indicates the offset into the containing file of the start of each chunk. 

Walking the above-described structures to find the appropriate data to 
display for a given time is fairly straightforward, generally involving indexing and 
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adding. Using the sync table, it is also possible to back-up to the preceding sync 
sample, and roll forward 'silently 1 accumulating deltas to a desired starting point. 

Figure 1 shows the structure of a simple movie with one track. A similar 
diagram may be found in the QuickTime file format documentation, along with a 
detailed description of the fields of the various objects. QuickTime atoms (objects) 
are shown here with their type in a grey box, and a descriptive name above. This 
movie contains a single video track. The frames of video are in the same file, in a 
single chunk of data. It should be noted that the 'chunk' is a logical construct 
only; it is not an object. Inside the chunk are frames of video, typically stored in 
their native form. There are no required headers or fields in the video frames 
themselves. 

Figure 2 is a diagram of a self-contained file with both an audio and a 
video track. Fewer of the atoms are shown here, for brevity; the pointers from the 
tracks into the media data are, of course, the usual sample table declarations, 
which include timing information. 

The QuickTime file format has a number of advantages, including: 

1) Scalability for size and bit-rates. The meta data is flexible, yet compact. 
This makes it suitable for small downloadable movies (e.g. on the Internet) 
as well as providing the basis for a number of high-end editing systems, 
which may require the use of relatively large amounts of uncompressed 
data. 

2) Physical structure is independent of the logical and temporal structure. This 
makes it possible to optimize the physical structure differently depending on 
the use the file will have. In particular, it means that a single file format is 
suitable for authoring and editing; downloading or placing on CDROMs; 
and for streaming. 

3) The file format has proven capable of handling a very broad variety of 
codec types and track types, including many not known at the time the 
format was designed. This proven ability to evolve in an upwards- 
compatible fashion is fundamental to the success of a storage format. 
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Scalable, or layered, codecs can be handled in a number of ways in the 
QuickTime file format. For a streaming protocol which supports scalability, the 
samples may be tagged with the layer or bandwidth threshold to be met for 
transmitting the samples. 

Tracks which form a set of alternatives (e.g. different natural language 
sound tracks) can be tagged so that only one is selected for playback. The same 
structure can be used to select alternatives for streaming (e.g. for language 
selection). This capability is described in further detail in the QuickTime file 
format. 

When QuickTime displays a movie or track, the appropriate media handler 
accesses the media data for a particular time. The media handler must correcdy 
interpret the data stream to retrieve the requested data. For example, with respect 
to video media, the media handler typically traverses several atoms to find the 
location and size of a sample for a given media time. The media handler may 
perform the following: 

1 . Determine the time in the media time coordinate system. 

2. Examine the time-to-sample atom to determine the sample number that 

contains the data for the specified time. 

3. Scan the sample-to-chunk atom to discover which chunk contains the 
sample in question. 

4. Extract the offset to the chunk from the chunk offset atom. 

5. Find the offset within the chunk and the sample's size by using the sample 

size atom. 

It is often desirable to transmit a QuickTime file or other types of time 
related sequences of media data over a data communication medium, which may be 
associated with a computer network (e.g. the Internet). In many computer 
networks, the data which is transmitted into the network should generally be in a 
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packet form. Normally, time related sequences of media data are not stored in the 
proper packetized format for transmission over a network. For example, media 
data files in the QuickTime format are not in a packetized format. Thus, there is a 
need to collect the data, sometimes referred to as streaming data, into packets for 
transmission over a network. 

One prior approach to address the problem of transmitting time related 
sequences of media data over a network is to send the media file over the network 
using a network or transmission protocol, such as the Hypertext Transfer Protocol 
(HTTP). Thus, the media file itself is sent from one computer system over the 
network to another computer system. However, there may be no desire to retain 
the media file at the receiving computing system. That is, when the media file is 
received and viewed or listened to at the receiving computer system, there may be 
no desire or capability by the user of that receiving computer system to store a 
copy of the file, for example, if the receiving computing system is a network 
computer or a computer with low storage capacity. 

Another alternative approach to solving the problem of how to collect data 
for transmission by packets over a network is to prepare a file which contains the 
network protocol data units in the file for a particular transmission protocol. In a 
sense, such a file may be considered a packetized file which is stored in essentially 
the same format as it will be transmitted according to the particular transmission 
protocol. Performing this operation generally involves storing the file in a 
packetized form for a particular network protocol at a particular data transmission 
rate and a particular media file format. Thus, for each different transmission 
protocol at a particular data transmission rate, the file will essentially be replicated 
in its packetized form. The fixed form of such files may restrict their 
applicability/compatibility and make it difficult to view such files locally. Thus, 
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such an approach may greatly increase storage requirements in attempting to 
provide the file in various transmission protocols at various different data 
transmission rates. Moreover, each packetized file generated according to this 
alternative prior approach is generally limited to a particular media file format, and 
thus, other media file formats for the same media object (e.g. a digital movie) are 
typically packetized and stored on the sending computer system. 

Yet another approach to solving the problem of how to stream time related 
sequences of media data is to perform the packetization of the media data when 
required on the transmitting system according to the particular transmission 
protocol which is desired. This processing requires, in many cases, a relatively 
considerable amount of time, and thus, may slow the performance of the 
transmitting system. 

Thus, it is desirable to provide an improved method and apparatus for 
transmitting time related sequences of media data. 

£1 IMMARY OF THR INVENTION 

The present invention relates to information which allows transmission of 
time related data over a data communication medium. In one embodiment, a set of 
data includes at least one instruction to cause a digital processing system to 
generate a set of packets representing a time related sequence of media data. The 
set of packets is associated with a transmission protocol. The set of data includes 
a time related sequence of data which is associated with the time related sequence 
of media data. 
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BRIEF DESC RIPTION OF THE DRAWINGS 

Figure 1 shows an example of the structure of a simple movie with one 
track in the prior art. 

Figure 2 is an example of a self-contained movie file of the prior art. 

Figure 3 is a flowchart showing one example of a method according to the 
present invention. 

Figure 4 shows an example of a hint track of the present invention. 

Figure 5 shows another example of a hint track of the present invention. 

Figure 6 is a diagram of a network of computer systems in which media 
data may be exchanged and/or processed, according to one embodiment of the 
present invention. 

Figure 7 is a block diagram of a digital processing system which may be 
used in accordance with one embodiment of the present invention. 

Figure 8 is a block diagram of a system that utilizes hints to transfer media 
data, according to one embodiment of the invention. 

Figure 9 is a block diagram of a system that utilizes hints to transfer media 
data, according to one embodiment of the invention. 

Figure 10 is a flow diagram illustrating a method for generating hints for 
providing media data transmission, according to one embodiment of the invention. 

Figure 1 1 is a flow diagram illustrating a method of processing media data 
received by a receiving system in accordance with hints, according to one 
embodiment of the invention. 

Figure 12 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a generator, according to one 
embodiment of the invention. 
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Figure 13 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a server, according to one 
embodiment of the invention. 

Figure 14 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a receiving system or other digital 
processing system, according to one embodiment of the invention. 

Figure 15 is a diagram of a data storage and/or communication medium 
having stored/transported thereon media and hint information, according to one 
embodiment of the invention. 

DFTATT JBD DESCRIPTION 

The present invention provides methods and apparatuses for allowing the 
transmission, and particularly the packetized transmission of time related 
sequences of media data, which may include, for example, video, audio, video 
and audio, etc., over a communication media, such as in a computer network. 

In one embodiment of the present invention, a digital processing system 
creates a set of data for indicating how to transmit a time related sequence of media 
data according to a transmission protocol. Typically, this set of data is stored on a 
storage device coupled to the digital processing system. Further, this set of data is 
a time related sequence of data associated with the time related sequence of media 
data. 

The present invention may be implemented entirely in executable computer 
program instructions which are stored on a computer readable media or may be 
implemented in a combination of software and hardware, or in certain 
embodiments, entirely in hardware. Typically, a server computer system coupled 
to a network will create the set of data, which may be referred to as a hint track and 
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will store this hint track in a storage device which is coupled to the server 
computer system. When a client computer system requests a presentation (e.g. a 
viewing or listening or viewing and listening) of a media data file, the server 
system uses the hint track to determine how to packetize the media data for 
transmission to the client computer system. It will be appreciated that the present 
invention is generally applicable to time related sequences of media data, and that 
QuickTime is represented herein as one example of this general applicability. 
Thus, the invention should not necessarily be limited to QuickTime. 

Figure 3 shows one example of a method according to the present 
invention. The method 300 shown in Figure 3 begins in step 301, in which the 
media file format for the particular media data which is desired to be transmitted is 
determined. In step 303, the particular transmission protocol or protocols which 
are desired to be used is also determined. However, steps 301 and 303 are 
optional, for example, in the case where the same media file format is always 
transmitted using the same transmission protocol. 

In step 305, a digital processing system, such as a server computer 
system, creates and stores the hints for packetizing a time related sequence of 
media data in a media file. Alternatively, one computer system may create the 
hints and provide them to another system, such as a server computer system, 
which stores them for later use in a transmission process. The packetization 
allows the transmission over a network or communication media according to the 
desired transmission protocol which was determined in step 303. In one 
embodiment of the present invention, the hints are stored as a track of time related 
sequence of hints which refers to, but which in one embodiment, is separate from 
other tracks of media data. The track of hints, in one embodiment of the present 
invention, may be stored separately from the media data to which it refers. As 
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such, the track of hints may be stored in a file which is distinct from another file 
containing the media data which is referred to by the track of hints, or the track of 
hints may be stored in a hint area in the file containing the media data which is 
separate and distinct from the data area containing the actual media data. In one 
embodiment of the invention, a hint track, or portion thereof, may be interpreted as 
executable instructions by the server, which executable instructions cause the 
server to packetize a time related sequence of data, which is typically, but not 
necessarily, time-based media data. In one embodiment of the present invention, 
the hints are stored on the storage device which is coupled to the transmitting 
digital processing system. 

In step 307, the data which is packetized according to the hints, is 
transmitted from a transmitting system, such as a server computer system, to a 
receiving system. This media data is transmitted by packetizing the media data 
according to the hints. In one alternative embodiment of the invention, the server 
computer system may decide not to use the hints and to send the media data by an 
alternative packetization process. 

In step 309, the receiving system presents the media object which is 
represented by the media data. Typically, this presentation (which may be a 
viewing and listening of a media object or merely a viewing or merely a listening 
of the media object) is performed as the packetized data is received at the receiving 
system. The packetized data may, in one embodiment of the present invention, but 
need not be, stored on the receiving system. Thus the presentation of the data is 
ephemeral in the sense that once the presentation is over, there is no local copy at 
the receiving system. In another embodiment, presentation of the media object 
may take place on the server system subsequent to creating hints for the media data 
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representing the media object. In one embodiment of the invention, the media data 
is not necessarily (re)formatted, copied, etc., for packetization according to hints. 

In step 31 1, the receiving system may optionally reassemble the media file 
if the media file as received has been stored on the receiving system. It will be 
appreciated that the various steps of the method shown in Figure 3 may be 
performed in a different order than the one shown and described above and/or 
some of the steps may be performed simultaneously. For example, in one 
embodiment, steps 309 and 311 are performed in parallel. 

A particular implementation with QuickTime according to one embodiment 
of the present invention will now be described. In one embodiment of the present 
invention, a presentation which can be both viewed locally to the file (e.g., at a 
server, generator, etc.), and streamed over a network within a QuickTime movie is 
provided. In general, the streaming server (or another system) should have 
information about the data units to stream, their composition and timing. Since 
such information is typically temporal it may be described in tracks. A server may 
perform packetization and determine protocol information, for example, by using 
the same indexing operations as would be used to view a presentation. 

The tracks which contain instructions for the servers are sometimes 
referred to as 'hint' tracks, since such tracks represent a set of data to direct the 
server in the process of forming and transmitting packets. The QuickTime file 
format supports streaming of media data over a network as well as local playback. 
The process of sending protocol data units is time-based, just like the display of 
time-based data, and is therefore suitably described by a time-based format. A 
QuickTime file or 'movie' which supports streaming includes information about 
the data units to stream. This information is included in additional tracks of the file 
called "hint" tracks. 
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Hint tracks contain instructions for a streaming server (or other digital 
processing system) which assist in the formation of packets. These instructions 
may contain immediate data for the server to send (e.g. header information) or 
reference segments of the media data. In one embodiment of the present 
invention, instructions are encoded in the QuickTime file in the same way that 
editing or presentation information is encoded in a QuickTime file for local 
playback. Instead of editing or presentation information, information may be 
provided which may allow a server to packetize the media data in a manner suitable 
for streaming using a specific network transport. 

In one embodiment of the present invention, the same media data is used in 
a QuickTime file which contains hints, whether it is for local playback, or 
streaming over a number of different transport types. Separate 'hint* tracks for 
different transport types may be included within the same file and the media may 
play over all such transport types without making any additional copies of the 
media itself. In addition, existing media may be made streamable by the addition 
of appropriate hint tracks for specific transports. According to one aspect of the 
invention, media data itself need not be recast or reformatted. 

Therefore the samples in a hint track generally contain instructions to form 
packets. These instructions may contain immediate data for the server to send 
(e.g. header information) or reference segments of the media data in another track. 

In one embodiment of the present invention, a three-level design is utilized 
such that: 

1) The media data is represented as a set of network-independent tracks, 
which may be played, edited, and so on, as normal; 

2) There is a common declaration and base structure for server hint tracks; 
this common format is protocol independent, but contains the 
declarations of which protocol(s) are described in the server track(s); 
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3) There is a specific design of the server hint tracks for each protocol 
which may be transmitted; all these designs use the same basic 
structure. For example, there may be designs for RTP (for the 
Internet) and MPEG-2 transport (for broadcast), or for new standard or 
vendor-specific protocols. 

In one embodiment of the present invention, the resulting streams, sent by 
the servers under the direction of the hint tracks, are normal streams, and do not 
necessarily include a trace of QuickTime information. This embodiment of the 
invention does not require that QuickTime, or its structures or declaration style, 
necessarily be either in the data on the transmission medium (e.g. network cable) 
or in the decoding station. For example, a file using H.261 video and DVI audio, 
streamed under RTP, may result, in one embodiment of the present invention, in a 
packet stream which is fully compliant with the IETF specifications for packing 
those codings into RTP. 

In one embodiment of the invention, hint tracks are built and flagged so 
that when the presentation is viewed locally, the hint tracks are essentially ignored 
by a receiving system. 

In one embodiment, a time related sequence of media data, which may, for 
example, include video, audio, etc., may be packetized by a digital processing 
system, and then presented on the same digital processing system. Furthermore, 
packetization may be ephemeral, such that the time related sequence being 
presented, stored, read, etc., is also packetized "on the fly." In one embodiment, 
hints may refer to media data that has not been copied, formatted, etc.; for 
example, the media data to which hints refer may be stored in original format on a 
read-only memory, etc. 

In one embodiment, the same hinting routine that provides packetization 
also presents the media as packetization is performed. In alternative embodiments 
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of the invention, a packetized file of time related media data may be generated 
according to hint tracks and stored, for example, for later transmission. 

Figure 4 illustrates utilization of hint tracks for transporting media data, 
according to one embodiment of the invention. In Figure 4, a hint track 401 is 
shown for the media track 403. Each hint track sample, such as hint track sample 
405 — which describes how to form an RTP packet — may contain a header, and 
may reference some data from an associated media track — in this case, a video 
track 403. In the embodiment shown in Figure 4, the media data (the video 
frames) and the RTP hints have been interleaved so that the associated media file 
may be read relatively easily. In this example, each frame is shown as fitting into 
a single RTP packet. Of course, it is possible to split frames into several packets 
when needed. Conversely, multiple frames can, if desired, be placed in a single 
packet, which is commonly performed with audio data. 

As discussed above, the logical structure described above need not imply 
physical structure. The meta data may be cached in memory, and the hint track 
samples physically interleaved with the media samples to which they refer (as is 
shown in Figure 4). 

Alternatively, it is possible to write a new set of meta data and media data, 
containing the hint tracks, which references and augments the meta data and media 
data in an existing presentation. Figure 5 illustrates utilization of hint tracks to 
reference media data in a separate file, according to one embodiment of the 
invention. In Figure 5, two movie files 502 and 504 are shown, each with their 
own meta-data. The first, the movie file 502, includes a video track. The second, 
the movie file 504, contains both a video track and a hint track, but the meta-data 
declares that the media data for the video track is in the first movie 502. Thus the 
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hints associated with the movie file 504 also point to the media data in the first 
movie 502. 

In one embodiment of the present invention, a media file may contain 
packetization hint tracks for multiple protocols. As such, each track may contain 
declarations of the protocol (and protocol parameters, if appropriate) for which the 
hint track is appropriate. These tracks may all, of course, reference media data 
from the basic media tracks in the file. The desire for protocol independence and 
extensibility may be met in the described manner. 

In one embodiment of the present invention, hint tracks need not use all the 
data in the media tracks. The hint tracks may use a subset of the data (e.g. by 
omitting some video frames) to reach a bandwidth threshold, or for other reasons. 
Since multiple hint tracks may be provided for the same protocol, differing subsets 
of the same basic media information at different rates may be provided. As such, 
the present invention may provide improved scalability over prior methods and 
apparatuses. 

It should be emphasized that though the hint tracks themselves, and the 
QuickTime meta-data, should, in one embodiment, be in QuickTime files, the base 
media can be left in any file type which QuickTime can import and reference in 
place. In one embodiment of the present invention, the meta-data in the movie file 
may include a data reference which declares that the media data is in another file. 
The sample table offsets and pointers may thus refer to data in this foreign' file. 
Thus, according to one embodiment of the present invention, existing legacy 
formats such as "au" audio files, "AVI" audio/video files, and MIDI files, may be 
streamed without requiring the copying or reformatting of the base media data. 
Since the base media data is not written to, but merely augmented by QuickTime 
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declarations and hint information in separate files, the base media data may also be 
provided on read-only machine readable media such as CDROM. 

In one embodiment of the present invention, the hint tracks embody the 
results of off-line computation and are typically optimized to provide the server 
with information to support packetization, and if needed, multiplexing. 

Example hints, for example, for RTP (the IETF standard real-time 
protocol) and MPEG-2 transport are shown in Appendixes A-C. 

In one embodiment of the present invention, a single file may support hint 
tracks for multiple protocols, or multiple different parameterizations of the same 
protocols, without undue space overhead. New protocols, and their associated 
hint tracks, may be designed without disrupting systems relying on existing 
protocols. Thus the invention, at least in one embodiment, is protocol-neutral. 

In the QuickTime file format, a track may be added to the movie by 
updating or copying and augmenting the meta-data. If the media data is in files 
separate from the meta-data, or optimized interleave is not required, this can be a 
relatively simple and efficient operation. 

In one embodiment of the present invention, tracks may be extracted by 
building a new set of movie meta-data which contains only one track, and which 
can, if desired, reference the media data in the original. 

For example, in one embodiment of the present invention, a new audio 
track may be added which is marked as being an alternative to a set of other audio 
tracks. If it is also marked with the language code (e.g. French, or Tagalog), then 
the appropriate track may be selected at presentation time. 

SMPTE time-code tracks are an example of elementary streams which may 
be present, added, or removed, as need arises, according to one embodiment of 
the invention. 
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According to one aspect of the invention, hint tracks may permit the 
development of new formats for new protocols without causing compatibility 
issues for existing servers or local playback. In addition, new media tracks may 
be added over the life of the file format while maintaining backwards 
compatibility. 

In one embodiment of the present invention, the areas of extensibility 
include: 

a) New track types which can be defined for media types not covered by the 
current QuickTime file format (e.g. laboratory instrument readings). 

b) New coding types for existing tracks which may be defined (e.g. video or 
audio codecs). There is explicit provision for their codec-specific initialization 
information. 

c) New hint track types which may be defined for new protocols, and a file 
which may contain hint information for more than one protocol without 
incurring a space overhead for the media data itself. 

Existing content on read-only media may be used with the present 
invention (e.g., prepackaged movies on CD ROM, DVD, etc.). 

Furthermore, according to one aspect of the invention, various "foreign" 
file formats may be used. In one embodiment of the present invention, for 
example, if the existing content is either in QuickTime format, or can be imported, 
it may be edited and streamed without requiring copying or re-formatting. 

In one embodiment of the present invention, if a codec supports striping of 
the media data to achieve scalability of bandwidths, then these striped bandwidths 
may be represented using multiple stream tracks. Each track may represent a 
different bandwidth. Tracks may be grouped together in selected subsets of the 
basic media. 

In one embodiment of the present invention, if a protocol supports 
bandwidth scalability, then the hint track itself may contain information for each 
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protocol data unit (sample in the hint track). Information may include the 
bandwidth threshold above which the protocol data unit should be delivered to the 
network. Thus, hint tracks may indicate an available bandwidth as being high, 
low, etc M and/or other information relating to bandwidth for data transmission. 

In one embodiment of the present invention, if the protocol is a 
multiplexing protocol (e.g. MPEG-2 transport) then different hint tracks may be 
built which use a different subset of the elementary stream tracks to achieve 
different data-rates. Hence, some tracks may be omitted entirely for low bit-rate 
transmission. 

In one embodiment of the present invention, if it is desired to record the 
base data using different codecs, then those tracks may be formed into a group of 
alternatives, and only one selected for presentation. The selection of which track 
to use for presentation is typically protocol-dependent and may be achieved by 
using the hint track approaches described herein. 

In one embodiment of the present invention, encryption may also be pre- 
applied to a media file. In this case, the encrypted data may be stored in either (a) 
a new elementary stream (a new track) which is linked to the original media data 
(or the original media data may be removed if it is no longer needed) or (b) the hint 
track itself. In case (b), it is possible that the hint track does not extract any data 
from the elementary un-encrypted stream on the fly. Thus, all of the media data 
may be in the hint track as well as the streaming packet protocol data unit 
information, because the media data may be transformed by encryption. 

As an example of embedded object content information, the IETF session 
description information for a whole movie, and for individual tracks, may be 
stored in the meta-data for the RTP hint tracks, as user atoms. 
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In one embodiment of the present invention, a file format typically contains 
both media data in a playable format, and streaming information. In one 
embodiment, it is possible to stream directly from this format with relatively low 
overhead, while preserving the media independence, protocol independence, and 
ability to present the media locally. 

According to one aspect of the invention, hint tracks may abstract detailed 
knowledge of codecs, timing and packetization, into an off-line preparation 
process. Thus, following the hint tracks to generate the data stream may be 
relatively simple and require no specialized knowledge of the media being 
streamed. Thus, decoupling of a server, for example, from the details of the data 
content may be provided, according to one aspect of the invention. 

In one embodiment of the present invention, a set of hint tracks may be 
used to construct a file which is directly optimized for streaming — for example, by 
laying out network PDUs on disk at logical disk boundaries, in the time sequence 
in which they should sent. Such a file may no longer be a general presentation, 
but may be streamed. In one embodiment, packetized files created with hint tracks 
may be stored and, for example, later optimized for streaming. 

In one embodiment of the present invention, by encapsulating foreign file 
formats, media data may be retained in other formats while still be published in 
QuickTime. For example, an existing format may be directly encapsulated into a 
new media data file by applying the proper wrapper, or may be left intact and 
referred to in segments or as a whole by the hint track, allowing the legacy formats 
to be streamed without copying. A single movie may contain pieces selected from 
multiple legacy formats. This invention does not constrain the base media format. 

In general, a common format which spans capture, authoring and editing, 
download and streaming, will generally provide flexibility. Material may be 
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reworked after use, or used in multiple ways, without being copied or re- 
formatted. In one embodiment of the present invention, it is possible to re-work 
and re-use material which has been hinted, by stripping the hint tracks, using 
standard editors, and then re-hinting after editing is completed. 

If it is desired that a media file be downloaded for local viewing, an 
optimized interleaved file may be built for that purpose, with the streaming meta- 
data in a separate declaration file referencing the same base media data. The 
download may not, therefore, include the streaming information, and yet the media 
data may be present only once at a streaming server. 

By separating logical structure from physical structure, the physical 
structure of the file may be optimized differently depending on the application (e.g. 
editing, local viewing, streaming). 

By permitting the existence of multiple hint tracks for each media track, in 
one embodiment of the present invention, the file may be published by streaming 
over multiple protocols, without requiring multiple copies of the media. 

Figure 6 is a diagram of a network of computer systems in which media 
data may be processed, according to one embodiment of the present invention. As 
shown in Figure 6, a number of client computer systems, one or more of which 
may represent one implementation of the receiving system described above with 
reference to Figure 3, are coupled together through an Internet 622. It will be 
appreciated that the term "Internet" refers to a network of networks. Such 
networks may use a variety of protocols for exchange of information, such as 
TCP/IP, ATM, SNA, SDI, etc. The physical connections of the Internet and the 
protocols and communication procedures of the Internet are well known to those in 
the art. Access to the Internet 103 is typically provided by Internet service 
providers (ISPs), such as the ISP 624 and the ISP 626. Users on client systems, 
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such as the client computer systems 602, 604, 618, and 620, generally obtain 
access to the Internet through Internet service providers, such as ISPs 624 and 
626. Access to the Internet may facilitate transfer of information (e.g., email, text 
files, media files, etc.) between two or more digital processing systems, such as 
the client computer systems 602, 604, 618, and 620 and/or a Web server system 
628. For example, one or more of the client computer systems 602, 604, 618, 
and 620 and/or the Web server 628 may provide media data (e.g., video and 
audio, or video, or audio) to another one or more of the client computer systems 
602, 604, 618, and 620 and/or the Web server 628. Such may be provided in 
response to a request. As described herein, such media data may be transferred in 
the system 600 according hints. Such hints, in one embodiment of the invention, 
may be created according to a specific format of the media data and/or a specific 
data communication (e.g., network) protocol(s). 

The Web server 628 is typically comprised of at least one computer system 
to operate with one or more data communication protocols, such as the protocols 
of the World Wide Web, and as such, is typically coupled to the Internet 622. 
Optionally, the Web server 628 may be part of an ISP which may provide access 
to the Internet and/or other network for client computer systems. The client 
computer systems 602, 604, 618, and 620 may each, with appropriate web 
browsing software, access data, such as HTML documents (e.g., Web pages), 
which may be provided by the Web server 628. Such data may provide media, 
such as QuickTime movies, which may be presented by the client computer 
systems 602, 604, 618, and 620. 

The ISP 624 provides Internet connectivity to the client computer system 
602 via a modem interface 606, which may be considered as part of the client 
computer system 602. The client computer system may be a conventional 
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computer system, such as a Macintosh computer, a "network" computer, a 
handheld/portable computer, a Web TV system, or other types of digital 
processing systems (e.g., a cellular telephone having digital processing 
capabilities). Similarly, the ISP 626 provides Internet connectivity for the client 
computer systems 604, 618 and 620, although as depicted in Figure 6, such 
connectivity may vary between various client computer systems, such as the client 
computer systems 602, 604, 618, and 620. For example, as shown in Figure 6, 
the client computer system 604 is coupled to the ISP 626 through a modem 
interface 608, while the client computer systems 618 and 620 are part of a Local 
Area Network (LAN). The interfaces 606 and 608, shown as modems 606 and 
608, respectively, in Figure 6, may be an analog modem, an ISDN modem, a 
cable modem, a satellite transmission interface (e.g., "Direct PC), a wireless 
interface, or other interface for coupling a digital processing system, such as a 
client computer system, to another digital processing system. The client computer 
systems 618 and 620 are coupled to a LAN bus 612 through network interfaces 
614 and 616, respectively. The network interfaces 614 and 616 may be an 
Ethernet-type, Asynchronous Transfer Mode (ATM), or other type of network 
interface. The LAN bus is also coupled to a gateway digital processing system 
610, which may provide firewall and other Internet-related services for a LAN. 
The gateway digital processing system 610, in turn, is coupled to the ISP 626 to 
provide Internet connectivity to the client computer systems 618 and 620. The 
gateway digital processing system 610 may, for example, include a conventional 
server computer system. Similarly, the Web server 628 may, for example, include 
a conventional server computer system. 

The system 600 may allow one or more of the client computer systems 
602, 604, 618, and 620 and/or the Web server 628 to provide media data (e.g., 
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video and audio, or video, or audio) to another one or more of the client computer 
systems 602, 604, 618, and 620 and/or the Web server 628. Such data may be 
provided, for example, in response to a request by a receiving system, which may 
be, for example, one or more of the client computer systems 602, 604, 618, and 
620. As described herein, such media data may be transferred in the system 600 
according hints or hint tracks. Such hints, in one embodiment of the invention, 
may be created according to a specific format of the media data and/or a specific 
data communication (e.g., network) protocol(s) to allow, according to one aspect 
of the invention, packetization of media data. 

Figure 7 is a block diagram of a digital processing system which may be 
used in accordance with one embodiment of the present invention. For example, 
the digital processing system 650 shown in Figure 7 may be used as a client 
computer system, a Web server system, a conventional server system, etc. 
Furthermore, the digital processing system 650 may be used to perform one or 
more functions of an Internet service provider, such as the ISP 624 or 626. The 
digital processing system 650 may be interfaced to external systems through a 
modem or network interface 668. It will be appreciated that the modem or 
network interface 668 may be considered as part of the digital processing system 
650. The modem or network interface 668 may be an analog modem, an ISDN 
modem, a cable modem, a token ring interface, a satellite transmission interface, a 
wireless interface, or other interface(s) for providing a data communication link 
between two or more digital processing systems. 

The digital processing system 650 includes a processor 652, which may 
represent one or more processors and may include one or more conventional types 
of such processors, such as a Motorola PowerPC processor, an Intel Pentium (or 
x86) processor, etc. A memory 155 is coupled to the processor 652 by a bus 656. 
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The memory 155 may be a dynamic random access memory (DRAM) and/or may 
include static RAM (SRAM). The processor may also be coupled to other types of 
storage areas/memories (e.g., cache, Flash memory, disk, etc.), which could be 
considered as part of the memory 155 or separate from the memory 155. 

The bus 656 further couples the processor 652 to a display controller 658, 
a mass memory 662, the modem or network interface 668, and an input/output 
(I/O) controller 664. The mass memory 662 may represent a magnetic, optical, 
magneto-optical, tape, and/or other type of machine-readable medium/device for 
storing information. For example, the mass memory 662 may represent a hard 
disk, a read-only or writeable optical CD, etc. The display controller 658 controls 
in a conventional manner a display 660, which may represent a cathode ray tube 
(CRT) display, a liquid crystal display (LCD), a plasma display, or other type of 
display device. The I/O controller 664 controls I/O device(s) 666, which may 
include one or more keyboards, mouse/trackball or other pointing devices, 
magnetic and/or optical disk drives, printers, scanners, digital cameras, 
microphones, etc. 

It will be appreciated that the digital processing system 650 represents only 
one example of a system, which may have many different configurations and 
architectures, and which may be employed with the present invention. For 
example, Macintosh and Intel systems often have multiple busses, such as a 
peripheral bus, a dedicated cache bus, etc. On the other hand, a network 
computer, which may be used as a digital processing device of the present 
invention, may not include, for example, a hard disk or other mass storage device, 
but may receive routines and/or data from a network connection, such as the 
modem or interface 668, to be processed by the processor 652. Similarly, a Web 
TV system, which is known in the art, may be considered to be a digital 
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processing system of the present invention, but such a system may not include one 
or more I/O devices, such as those described above with reference to I/O device(s) 
666. Additionally, a portable communication and data processing system, which 
may employ a cellular telephone and/or paging capabilities, may be considered a 
digital processing system which may be used with the present invention. 

In the system 650 shown in Figure 7, the mass memory 662 (and/or the 
memory 654) may store media (e.g., video, audio, movies, etc.) which may be 
processed according the present invention (e.g., by way of hints). Alternatively, 
media data may be received by the digital processing system 650, for example, via 
the modem or network interface 668, and stored and/or presented by the display 
660 and/or I/O device(s) 666. In one embodiment, packetized media data may be 
transmitted across a data communication network, such as a LAN and/or the 
Internet, in accordance with hint tracks. On the other hand, the processor 652 may 
execute one or more routines to use a file with one or more hint tracks, or 
alternatively, to create one or more hint tracks, to process media (e.g., a pre- 
packaged movie, audio file, video file, etc.) for presentation or packetization 
according to the hint tracks. Such routines may be stored in the mass memory 
662, the memory 664, and/or another machine-readable medium accessible by the 
digital processing system 650. In one embodiment, the digital processing system 
650 may process media data having hint tracks embedded therein. Similarly, such 
embedded media data may be stored in the mass memory 662, the memory 664, 
and/or another machine-readable medium accessible by the digital processing 
system 650. 

Figure 8 is a block diagram of a system that utilizes hints to transfer media 
data, according to one embodiment of the invention. The system 680 shown in 
Figure 8 includes a receiving system, which is depicted as a client data processing 
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system 682 coupled to a server 694, via a data communication link 686. The 
server 694 and/or client data processing system may, for example, represent one 
or a combination of the devices/systems described with reference to Figures 6 and 
7. 

The server 694 includes a hint generation and processing unit 688, a media 
processing unit 690, and a data communication unit 692, each of which may 
include hard-wired circuitry or machine-executable instructions or a combination 
thereof. Furthermore, at least a portion of such hard- wired circuitry and/or 
machine-executable instructions may be shared between a combination of the hint 
generation and processing unit 688, the media processing unit 690, and the data 
communication unit 692. In one embodiment, at least one storage area/memory 
(e.g., a machine-readable medium) having appropriate routines and/or data stored 
therein coupled to at least one processor is utilized, at least in part, to implement 
one or a combination of the hint generation and processing unit 688, the media 
processing unit 690, and the data communication unit 692. 

In one embodiment, the hint generation and processing unit 688 creates 
and stores hints for packetization of media data processed by the media processing 
unit 690. As described above, the hints may be generated and stored as a separate 
file, relative to media files or may be embedded with media data. If more than one 
media format is to be processed, an appropriate format may be taken into 
consideration by the hint generation and processing unit 688 to generate the hints. 
Information about the media format may be provided by the media processing unit 
690, which may also provide the media data (e.g., media files of video, audio, or 
video and audio, etc.). Similarly, the data communication unit 692 may provide 
one or more data communication (e.g., network) protocols for exchange of such 
media data, packetized according to the hints, via the data communication link 686. 
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As such, the hint generation and processing unit may determine, based on media 
format information provided by the media processing unit 690 and data 
communication protocol information provided by the data communication unit 
692, appropriate hints and packetization of media and/or the hints for transfer to a 
receiving digital processing system, such as the client data processing system 682. 
In one embodiment, the streaming of the media and hints is done in accordance 
with the QuickTime format. 

In response to media data and hint packets received via the data 
communication link 686, the client data processing system 682 may present a 
media object represented by the media data. Such presentation may be performed 
ephemerally, as described above. In one embodiment of the invention, the media 
data may optionally be stored by the client data processing system 682 and 
reassembled, for example, at a later time, for presentation and/or transmission by 
the client data processing system 682. 

Figure 9 is a block diagram of a system that utilizes hints to transfer media 
data, according to one embodiment of the invention. In particular, Figure 9 
depicts an embodiment of the invention wherein a separate digital processing 
system, referred to as a generator, may generate hints (or hint tracks) to provide to 
another system, such a server, that uses the hints to packetize media data for 
transfer to another system, such as a client computer system. A system 696 is 
shown in Figure 9, which includes a server 700 which may exchange data, via the 
data communication link 686, with the client data processing system 682. 
However, in the embodiment shown in Figure 9, the server 700 does not generate 
the hints. Rather, a generator 7 10, coupled to the server 700 by a data 
communication link 708, includes a hint generation unit 712 to generate hints that 
are used to packetize media data. 
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In one embodiment, the operation of the system 696 is as follows: the 
server 700 makes a request to the generator 710 to generate hints for one or more 
media files containing media data. For example, the media files may be stored in 
the server 700 on a machine-readable medium. The request may include 
information to indicate the format of the media file and/or a data communication 
protocol for transmission of the media data and/or other data. The data 
communication protocol may be related to the data communication link 686, which 
may, in one embodiment of the invention, be associated with a network connection 
having particular physical and logical characteristics to facilitate exchange of media 
and/or other data between the server 700 and the client data processing system 
682. In response to the request, the hint generation unit 712 generates appropriate 
hints, which may be associated with a time-related hint track, and provides the 
hints to the server 700. In response to the hints received from the generator 710, 
via the data communication link 708, the server 700, and in particular, a hint 
processing unit 702 uses the hints to packetize the media data for transmission to 
the client data processing system 682. 

In response to media data and hint packets received via the data 
communication link 686, the client data processing system 682 may present a 
media object represented by the media data. Such presentation may be performed 
ephemerally, as described above. In one embodiment of the invention, the media 
data may optionally be stored by the client data processing system 682 and 
reassembled, for example, at a later time, for presentation and/or transmission by 
the client data processing system 682. 

Figure 10 is a flow diagram illustrating a method for generating hints for 
providing media data transmission, according to one embodiment of the invention. 
In step 720, a media format is determined for media data to be transmitted, if more 
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than one format will be used. If only one format is used, 720 may not be 
performed. In step 722, an appropriate data communication protocol(s) is 
determined, again, assuming that more than one (protocol) may be used. In step 
724, based on the media format and the data communication protocol(s) (one or 
both of which may have been selected/configured), hints (e.g., hint tracks) related 
to media data transmission are created and stored. 

In step 726, which is optional, the hints may be transmitted to another 
digital processing system. In one embodiment of the invention, for example, the 
method of Figure 10, at least in part, may be performed exclusively by one digital 
processing system (e.g., a server). In an alternative embodiment, the method of 
Figure 10, at least in part, may be performed by two or more digital processing 
systems. For example, attributes of media data may be provided by a server or 
other system to another digital processing system, such as a generator. In 
response, the generator may determine, based on the attributes, an appropriate 
media format, data communication protocol(s), and hints for packetization of 
media data, which may be stored at the server. Alternatively, the server may 
provide the appropriate media format and protocol(s) to the generator, which could 
then generate hints. The generator may transmit the hints to the server or other 
digital processing system, which could packetize media data according to the hints. 

Figure 1 1 is a flow diagram illustrating a method of processing media data 
received by a receiving system in accordance with hints, according to one 
embodiment of the invention. In step 730, media data transmitted according to a 
receiving system in accordance with hints or hint tracks is received by the 
receiving system. In one embodiment, the receiving system may receive 
packetized media data, as well as packetized hint tracks. The hint tracks, in one 
embodiment of the invention, may be associated with at least portions of the media 
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data. Such data may be received by the receiving system in response to a request 
that may be made by the receiving system. For example, in one embodiment, the 
receiving system may be a client computer system and the request may be made to 
a server or other digital processing system for the media data. In response, the 
server may generate (or have generated for it by a separate digital processing 
system) hints for packetizing the media data, and transmit the packetized media 
data, which may include hints, to the receiving system. 

In step 732, a media object represented by the media data received by the 
receiving system is presented by the receiving system. For example, the media 
data may include video, audio, or combination thereof that is "presented" by the 
receiving system, for example, on a display and speaker(s). As mentioned above, 
the media data may be associated with a QuickTime movie. 

Optionally, in step 734, the media data, which may include hints, may be 
stored by the receiving system as a media file(s). Thus, in alternative 
embodiments of the invention, step 732 may not be performed as the media data is 
received, or may be performed before, after, or in parallel with step 734. 

In step 734, the stored media file may optionally be reassembled and/or 
presented. As such, step 732 may be performed subsequent to step 734. 

Figure 12 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a generator, according to one 
embodiment of the invention. It will be appreciated that the actual memory that 
stores the elements shown in and described below with reference to Figure 12 may 
be one or several elements, such as one or more disks (which may, for example, 
be magnetic, optical, magneto-optical, etc.), the memory 654 and/or the mass 
memory 662 described above with reference to Figure 7. Furthermore, in one 
embodiment where the generator, with which the machine readable storage 
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medium shown in Figure 12 is associated, is a network computer, one or more of 
the elements of the machine readable storage medium may be stored at another 
digital processing system and downloaded to the generator. Furthermore, the 
elements described with reference to the machine readable storage medium may, at 
some point in time, be stored in a non-volatile mass memory (e.g., a hard disk). 
Conversely, at other times, the elements of the machine storage medium may be 
dispersed between different storage areas, such as DRAM, SRAM, disk, etc. 

Figure 12 shows a machine readable storage medium 740. In one 
embodiment, the machine readable storage medium is utilized, at least in part, by a 
digital processing system that generates hints or hint tracks, i.e., a generator, in 
accordance with one or more method(s) of the invention. The generator, as 
described with reference to Figure 8, may be integrated into a digital processing 
system that transmits media data according to the hint tracks, or may be, as 
described with reference to Figure 9, a digital processing system that creates and 
provides the hints to another digital processing system, such as a server, which 
utilizes the hints to packetize and transmit media data. 

As shown in Figure 12, the machine readable storage medium 740 
typically includes a number of elements. For example, the machine readable 
storage medium 740 includes software for providing operating system 
functionality to the generator, as depicted by a generator operating system (OS) 
742. A network transmission routine(s) 748 provides data communication 
functionality, such as routines, protocols, etc., to allow the generator to transmit 
and receive data via a data communication link. 

In addition, the machine readable storage medium 740 includes routines 
and data for creating hints associated with media transmission. As such, the 
machine readable storage medium 740 may optionally include information 750, 
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which may provide information relating to one or more data communication 
protocols and media formats which may be necessary for creation of hints by a 
hint creation routine(s) 744. For example, the information 750 may include 
information relating to QuickTime movies, RTP, MPEG, etc. However, such 
information may, at least in part, be integrated into the hint creation routine 744 
and/or be provided to the generator by a remote digital processing system. 

The hints created by the hint creation routine(s) 744 may be stored as 
created hints 746 and/or stored/transmitted elsewhere (e.g., at a remote digital 
processing device, which may be a server). The hints are hint tracks that are time- 
related for packetization and transmission of media data, which is also time-related 
(e.g., video, audio, video and audio, etc.). 

Although the machine readable storage medium 740 is described with 
reference to a generator, the medium 740, at least in part, may be part of a number 
of types of digital processing systems, data storage media, etc. For example, the 
machine readable storage medium 740, at least in part, may be included as part of a 
server or other digital processing system. Furthermore, the machine readable 
storage medium 740, at least in part, may be included as part of a software utility 
on one or more disks or other machine readable media. 

Figure 13 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a server, according to one 
embodiment of the invention. It will be appreciated that the actual memory that 
stores the elements shown in and described below with reference to Figure 13 may 
be one or several elements, such as one or more disks (which may, for example be 
magnetic, optical, magneto-optical, etc.), the memory 654 and/or the mass 
memory 662 described above with reference to Figure 7. Furthermore, in one 
embodiment where the server, with which the machine readable storage medium 
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shown in Figure 13 is associated, is a network computer, one or more of the 
elements of the machine readable storage medium may be stored at another digital 
processing system and downloaded to the server. Furthermore, the elements 
described with reference to the machine readable storage medium may, at some 
point in time, be stored in a non-volatile mass memory (e.g., a hard disk). 
Conversely, at other times, the elements of the machine storage medium may be 
dispersed between different storage areas, such as DRAM, SRAM, disk, etc. 

Figure 13 shows a machine readable storage medium 760. In one 
embodiment, the machine readable storage medium is utilized, at least in part, to 
packetize media data for transmission on a data communication link in accordance 
with one or more method(s) of the invention. The machine readable storage 
medium 760 may be associated with a server, such as the server 694 described 
with reference to Figure 8, to include routines to create hint tracks and transmit 
media data according to the hint tracks. In another embodiment, the machine 
readable storage medium 760 may be associated with a digital processing system, 
such as the server 700 described with reference to Figure 9, wherein a digital 
processing system, such a generator, includes routines to create hints, and the 
server, using the hints as processed by routines provided by the machine readable 
storage medium 760, may packetize and transmit media data. 

The machine readable storage medium 760 includes a number of elements. 
For example, the machine readable storage medium 760 includes software for 
providing operating system functionality to the server, as depicted by a server 
operating system (OS) 762. A network transmission routine(s) 768 provides data 
communication functionality, such as routines, protocols, etc., to allow the server 
to transmit and receive data via a data communication link. 
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In addition, the machine readable storage medium 760 includes a media 
packetization routine 770 for packetizing media data, which may be time-related, 
based on hints, and which may also be packetized. Accordingly, the machine 
readable storage medium 760 includes a media data storage area 764 and a hint 
storage area 766 to store media data (which may, for example, be QuickTime 
movies or other media tracks) and hints (e.g., hint tracks), respectively. The hints 
may include hint tracks that are time-related for packetization and transmission of 
media data, which is also typically time-related (e.g., video, audio, video and 
audio). In one embodiment, the hint tracks are packetized separately from the 
media data packets. In one embodiment, hints include pointer information 
identifying media data (e.g., a particular packet(s)) which may be in a separate 
media file. 

Figure 14 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a receiving system or other digital 
processing system, according to one embodiment of the invention. It will be 
appreciated that the actual memory that stores the elements shown in and described 
below with reference to Figure 14 may be one or several elements, such as one or 
more disks (which may, for example be magnetic, optical, magneto-optical, etc.), 
the memory 654 and/or the mass memory 662 described above with reference to 
Figure 7. Furthermore, in one embodiment where the receiving system, with 
which the machine readable storage medium shown in Figure 14 is associated, is a 
network computer, one or more of the elements of the machine readable storage 
medium may be stored at another digital processing system and downloaded to the 
receiving system. Furthermore, the elements described with reference to the 
machine readable storage medium may, at some point in time, be stored in a non- 
volatile mass memory (e.g., a hard disk). Conversely, at other times, the elements 
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of the machine storage medium may be dispersed between different storage areas, 
such as DRAM, SRAM, disk, etc. 

Figure 14 shows a machine readable storage medium 780. In one 
embodiment, the machine readable storage medium is utilized, at least in part, to 
process media data packetized in accordance with one or more method(s) of the 
invention. The machine readable storage medium 780 may be associated with a 
receiving system, such as the client data processing system 682 described with 
reference to Figures 8 and 9, to include routines to present media data 
transmitted/received according to hints. Alternatively, the machine readable 
storage medium 780 may include media data having hints (e.g., hint tracks) 
embedded therein. Such embedded media data may be pre-packaged or generated 
by a routine stored on a machine readable storage medium, such as the machine 
readable storage medium 780. 

The machine readable storage medium 780 may include a number of 
elements. For example, the machine readable storage medium 780 includes 
software for providing operating system functionality to the receiving system, as 
depicted by a server operating system (OS) 772. A network transmission 
routine(s) 782 provides data communication functionality, such as routines, 
protocols, etc., to allow the server to transmit and receive data via a data 
communication link. 

In addition, the machine readable storage medium 780 includes a media 
presentation routine 778 for presenting media data packetized according to hints. 
Thus, the machine readable storage medium 780, and in particular, the media 
presentation routine 778, may include routines for decompression of audio and/or 
video data, displaying of video, and/or playing back audio, etc. Furthermore, the 
media presentation routine 778 typically provides handling of hints that are 
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associated with the media data. In one embodiment, the hints are simply ignored 

as media is presented. 

Optionally, the machine readable storage medium 780 may store media data 

that has been packetized according to hints as media data 774, and include a media 

data reassembly routine 776 to reassemble to the stored media data (e.g., to be 

presented, transmitted, etc.). 

Figure 15 is a diagram of a data storage and/or communication medium 

having stored/transported thereon media and hint information, according to one 
embodiment of the invention. A data storage and/or communication medium 
(medium) 800 is shown, which represents various types of transport and/or 
storage medium in which a media data packet 804 and a hint packet 806 packetized 
according to the present invention could be stored or transported. For example, 
the medium 800 may represent the mass memory 662 and/or the memory 654, 
described above with reference to Figure 7. The medium 800 may also represent a 
communication medium, such as the LAN bus 612 shown in Figure 6 or the data 
communication link 686 for transporting data/signals representing media and/or 
other information. 

The hint packet 806 and the media packet 804 may be integrated into one 
packet or be stored and/or transported separately, as depicted in Figure 15. 
Furthermore, the hint packet 806 and the media packet 804 may embody several 
types of formats, such as ones described herein or one associated with other media 
formats, network protocols, and/or digital processing device architecture. 

Provided below are some example formats of hints. It will be appreciated 
that the present invention, however, may be utilized with various types of network 
protocols, digital processing system architectures, media formats, etc., to provide 
transmission of time-based data. 
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Alternattve Embodiments 

While the invention has been described in terms of several embodiments 
and illustrative figures, those skilled in the art will recognize that the invention is 
not limited to the embodiments or figures described. In particular, the invention 
can be practiced in several alternative embodiments that provide packetization of 
time related media data. 

Therefore, it should be understood that the method and apparatus of the 
invention can be practiced with modification and alteration within the spirit and 
scope of the appended claims. The description is thus to be regarded as illustrative 
instead of limiting on the invention. 

Appendix A - Packetization Hint Samp le Description 

In one embodiment of the present invention, each hint track has a table of 
sample descriptions. Hint tracks typically have one sample description. The 
format for each sample description entry for a hint track, according to one 
embodiment of the present invention, is described below in Table 1. 



Table 1: Hint Track Sample Descrip 


tion Format 


Hint Track Sample Description 


Bytes 


Sample description size 


4 


Data format 


4 


Reserved 


6 


Data reference index 


2 


Max packet size 


4 


Additional data table 


variable 



The packetization hint header atom contains the following data elements: 
Field descriptions: 

Sample A 32-bit integer that specifies the number of bytes 
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Reserved 
Data reference 



Max packet size 



Additional Data 
Table 
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in the sample description. 

A 32-bit integer indicating the format of the hints 

stored in the sample data. Different formats may be 

defined for different hint types. The table below 

lists defined formats. 

Six bytes that are set to 0. 

A 16-bit integer that contains the index of the data 
index associated with the samples that use this 
sample description. Data references are stored in 
data reference atoms. 

A 32-bit integer indicating the maximum size of 

packets computed in this track. 

A table containing additional information needed 

on a per track basis. The values are tagged entries. 

There are no required entries. If an entry is not 

present in the table, a reasonable default may be 

used. 



The structure for the additional data table entries is shown in Table 2. 
Table 2: Additional Data Table Format 



Additional Data Table 


Bytes 


"Entry length 


4 


Data type 


4 


Data 


Entry length - 8 



The additional data table entries contain the following data elements: 
Field descriptions: 
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Entry length A 32-bit integer indicating the length of the entire 

entry (includes 8 bytes for the length and type fields) 
in bytes. 

Data type A 32-bit integer indicating the meaning of the data 
in the entry. 

Data The data for this entry. The length of the data is 

indicated by the Data length field of the table. 

The following data tags may defined for several various types of data 
format types. Other tags may be created as required. 

Length Type Data Description 

9 'rely' A 1 byte integer indicating whether or not 

this track should be sent over a reliable 
transport. Values of 0 and 1 are defined. If 
this tag is not present, it is assumed to have 
the value zero, indicating that it can be sent 
over unreliable transports, such as UDP. 

The following data format types are defined. New types may be defined as 
needed. 

Data Format Description 

'rtp 1 The packetization hints for sending media over RTP 

for the specific media type and encoding as 
described various IETF drafts of the Audio- Video 
(A VT) working group. 
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The following data tag is utilized in one embodiment for 'rtp 1 data. 
Length Type Data Description 

12 'tims' A 32-bit number indicating the RTP 

timescale. This tag is present in one 
embodiment for RTP data. 

The following data tags are optional for *rtp' data. 

Length Type Data Description 

12 'tsro' A 32-bit number indicating the 

random offset to add to the stored 
time stamp when sending the RTP 
packets. If this field is not present, a 
truly random number should be 
used, per the RTP specification. 
The value of this field could be zero, 
indicating that no random offset is to 
be added. 

10 'snro* A 16-bit number indicating the 

random offset to add to the sequence 
number when sending the RTP 
packets. If this field is not present, 
truly random number should be 
used, as per the RTP specification. 
The value of this field could be zero, 
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indicating that no random offset is to 
be added. 

Appendix B — Example hint t rack for RTP 

This section presents one example of a hint track format for streaming RTP 

from a QuickTime movie. 

In standard RTP, each media stream is typically sent as a separate RTP 

stream. Multiplexing is generally achieved by using IP's port-level multiplexing, 

not by interleaving the data from multiple streams into a single RTP session. 

Therefore each media track in the movie should have an associated RTP hint track. 

In one embodiment of the present invention, each hint track contains a track 

reference back to the media track which it is streaming. 

In this example, the packet size is determined at the time the hint track is 
created. Therefore, in the sample description for the hint track (a data structure 
which can contain fields specific to the 'coding' - which in this case is a protocol), 
the chosen packet size is indicated. In one example of the present invention, 
several RTP hint tracks are provided for each media track to provide different 
packet size choices. Other protocols may be parameterized as well. Similarly, the 
appropriate time-scale for the RTP clock is provided in the sample description 
below. 

The hint track is related to its base media track by a single track reference 
declaration. (RTP does not permit multiplexing of media within a single RTP 
stream). The sample description for RTP declares the maximum packet size which 
this hint track will generate. Session description (S AP/SDP) information is stored 
in user-data atoms in the track. 
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Each sample in the RTP hint track contains the instructions to send out a 
set of packets which must be emitted at a given time. The time in the hint track is 
emission time, not necessarily the media time of the associated media. 

In the following description the internal structure of samples, which are 
media data, not meta data, in the terminology of this example is described, need 
not be structured as objects. 

In this example, each sample contains two areas: the instructions to 
compose the packets, and any extra data needed when sending those packets (e.g. 

an encrypted version of the media data). 

struct RTPsample { 

int { 16 ) packetcount ; 

RTPpacket packets [packetcount] ; 
byte [ ] extr ada ta ; 

} 

Each RTP hint packet contains the information to send a single packet. In 
one embodiment, to separate media time from emission time, an RTP time stamp is 
specifically included, along with data needed to form the RTP header. In 
alternative embodiments, however, this is not the case. Other header information 
is typically supplied. A table of construction entries is constructed as follows: 

struct RTPpacket { 

int (32) RTPtime; 

int ( 16 ) partialRTPheader ; 

int ( 16 ) RTPsequenceseed; 

int (16) entrycount ; 

dataentry constructors [entrycount ] ; 

} 

There are various forms of the constructor. Each constructor is 16 bytes, 
which may make iteration relatively simple. The first byte is a union discriminator; 
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struct dataentry { 

int ( 8 ) entrytype ; 
switch entrytype { 

case immediate: 
int (8) 
int ( 8 ) 
case media sample : 



bytecount ; 

bytes tocopy [by tecount] ; 



int (8) 
int (16) 
int (32) 
int (32) 



reserved [5] ; 
length; 

mediasamplenumber ; 
mediasampleof f set ; 



case hintsample : 



int ( 8 ) 
int (16) 
int (32) 
int (32) 



reserved [5] ; 
length; 

hint s amp 1 enumber ; 
hintsampleof f set ; 



} 

} 



The immediate mode permits the insertion of payload-specific headers (e.g. 
the RTP H.261 header). For hint tracks where the media is sent 4 in the clear*, the 
mediasample entry may specify the bytes to copy from the media track, by giving 
the sample number, data offset, and length to copy. For relatively complex cases 
(e.g. encryption or forward error correction), the transformed data may be placed 
into the hint samples, and then hintsample mode may be used, which would be 
provided from the extradata field in the RTPsample itself. 

In one example of the present invention, there is no requirement that 
successive packets transmit successive bytes from the media stream. For example, 
to conform with RTP-standard packing of H.261, in one example of the present 
invention, a byte may be sent at the end of one packet and also at the beginning of 
the next (when a macroblock boundary falls within a byte). 

A ppendix C - Packetization Hint Sample Data for Data Format 'rtp' 

This appendix provides a description of the sample data for the 'rtp* 
format, according to one embodiment of the invention. The ftp* format assumes 
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that a server is sending data using Real Time Transport Protocol (RTP). This 
format assumes that the server knows about RTP headers, but does not require 
that the server know anything about specific media header, including media 
headers defined in various IETF drafts. 

In one embodiment of the present invention, each sample in the hint track 
will generate one or more RTP packets. Each entry in the sample data table in a 
hint track sample corresponds to a single RTP packet. Samples in the hint track 
may or may not correspond exactly to samples in the media track. In one 
embodiment of the present invention, data in the hint track sample is byte aligned, 
but not 32-bit aligned. 

Field descriptions: 

Entry count A 16-bit unsigned integer indicating the number of 

packet entries in the table. Each entry in the table 
corresponds to a packet. Multiple entries in a single 
sample indicate that the media sample had to be split 
into multiple packets. A sample with an entry count 
of zero is reserved and if encountered, should be 
skipped. 

Packet entry table A variable length table containing packet entries. 

Packet entries are defined below. 

Additional data A variable length field containing data pointed to by 

the entries in the data table shown below by Table 3: 
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Table 3 - Additional Data 



Packet Entry 


Bytes 


Relative packet transmission time 


4 


Flags 


4 


RTP header info 


2 


RTP sequence number 


2 


Entry count 


2 


Data table 


variable 



In one embodiment, the packet entry contains the following data elements: 
Field descriptions: 



relative packet 
transmission time 



flags 



A 32-bit signed integer value, indicating the time, 
in hint track's timescale, to send this packet relative 
to the hint sample's actual time. Negative values 
mean that the packet will be sent earlier than real 
time, which is useful for smoothing the data rate. 
Positive values are useful for repeating packets at 
later times. Within each hint sample track, each 
packet time stamp is nondecreasing. 
A 32-bit field indicating certain attributes for this 
packet. 



The RTP header information field contains the following element: 
Field Bit # Description 

R 31 A 1 -bit number indicating that this is a 
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repeat packet - the data has been defined in a 
previous packet. A server may choose to skip 
repeat packets to help it catch up when it is behind 
in its transmission of packets. All repeated packets 
for a given packet care in the same hint sample. 
All undefined bits (0-30) are reserved and are set to 
zero. 

RTP header info A 16-bit integer specifying various values to be set 

in the RTP header. 

The RTP header information field contains the following elements: 
Field Bit# Description 

p 2 A 1-bit number corresponding to the 

padding (P) bit in the RTP header. This bit 
may not be set, since a server that needed 
different packet padding may generally need 
to un-pad and re-pad the packet itself. 

X 3 A 1 -bit number corresponding to the 

extension (X) bit in the RTP header. This 
bit may not be set, since a server that needs 
to send its own RTP extension may either 
not be able to, or may be forced to replace 
any extensions from the hint track. 

M 8 A 1 -bit number corresponding to the marker 

(M) bit in the RTP header. 

pay load 9-15 A 7-bit number corresponding to the 

type 
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payload type (PT) field of the RTP header. 

All undefined bits (0-1 and 4-7) are reserved and are set to zero. 
The location of the defined bits are in the same bit location as in the 
RTP header. 

RTP sequence A 16-bit integer specifying the RTP sequence number for 
number 

the packet. The RTP server adds a random offset to this 
sequence number before transmitting the packet. This field 
allows re-transmission of packets, e.g., the same packet 
can be assembled with the same sequence number and a 
different (later) packet transmission time. For example, a 
text sample with a duration of 5 minutes can be 
retransmitted every 10 seconds so that clients that miss the 
original sample transmission (perhaps they started playing a 
movie in the middle) will be "refreshed" after a maximum 
of 10 seconds. 

Entry count A 16-bit unsigned integer specifying the number of entries 

in the data table. 

Data table A table that defines the data to be put in the pay load portion 

of the RTP packet. This table defines various places the 
data can be retrieved, and is shown by Table 4. 
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Table 4 - Data Table 



Data table entry 


Bytes 


Data source 


1 


Data 


15 



The data source field of the entry table indicates how the other 15 bytes of the 
entry are to be interpreted. Values of 0 through 4 are defined. The various data 
table formats are defined below. Although there are various schemes, the entries 
in the various schemes are typically 16 bytes long. 

No-Op Data Mode 

The data table entry has the following format for no-op mode: 
Field description: 

Data source = 0 A value of zero indicates that this data table entry is 

to be ignored. 

Immediate Data Mode 

The data table entry has the following format for immediate mode: 
Field description: 

Data source =1 A value of one indicates that the data is to be 

immediately taken from the bytes of data that 
follow. 

Immediate length An 8-bit integer indicating the number of bytes to 

take from the data that follows. Legal values range 
from 0 to 14. 
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14 bytes of data to place into the payload portion of 
the packet. Only the first number of bytes indicated 
by the immediate length field are used. 



Sample Mode 

The data table entry has the following format for sample mode: 
Field description: 

Data source =2 A value of two indicates that the data is to be taken from 
a track's sample data. 

Track ref index A value that indicates which track the sample data will 
come from. A value of zero means that there is exactly 
one media track referenced, which is to be used. 
Values from 1 to 127 are indices into the hint track 
reference atom entries, indicating from which original 
media track the sample is to be read. A value of -1 
means the hint track itself, i.e., the sample from the 
same track as the hint sample currently being parsed is 
used. 



Bytes per 

compression 

block 



A 16-bit unsigned integer specifying the number of 
bytes that results from compressing the number of 
samples in the Samples per compression block field. A 
value of zero is equivalent to a value of 1 . 



Samples per 
compression 
block 



A 16-bit unsigned integer specifying the uncompressed 
samples per compression block. A value of zero is 
equivalent to a value of 1 . 
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Length A 16-bit integer specifying the number of bytes in the 

sample to copy. 

Sample Number A 32-bit integer specifying sample number of the track. 

Offset A 32-bit integer specifying the offset from the start of 

the sample from which to start copying. If referencing 
samples in the hint track, this will generally point into 
the Additional Data area. 

If the bytes per compression block and/or the samples per compression 
block is greater than 1, than this ratio is used to translate a sample number into an 
actual byte offset. This ratio mode is typically used for compressed audio tracks in 
QuickTime movies, such that: 

CB = NS * BPCB / SPCB 

wherein, 

CB = compressed bytes 

NS = number of samples 

BPCB = bytes per compression block 

SPCB = samples per compression block 

For example, a GSM compression block is typically 160 samples packed into 33 
bytes. Therefore, BPCB = 33 and SPCB = 160. The hint sample requests 33 
bytes of data starting at the 161st media sample. Assuming that the first 
QuickTime chunk contains at least 320 samples, so after determining that this data 
will come from chunk 1, and where chunk 1 starts, the ratio is utilized to adjust the 
offset into the file where the requested samples will be found: 
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chunk_number = 1; /* calculated by walking the sample-to-chunk atom*/ 
first_sample_in_this_chunk = 1; /* also calculated from that atom*/ 
chunk_offset = chunk_offsets[chunk„number]; /* from the stco atom */ 
data_offset = (sample_number - first_sample_in_this_chunk) * BPP / SPP 
read_from_file(chunk_offset + data_offset, length); /* read our data */ 



Sample Description Mode 

The data table entry has the following format for sample description mode: 



Field description: 



Data source = 3 



A value of three indicates that the data is to be taken 



from the media track's sample description table. 



Track ref index A value that indicates which track the sample data will 



come from. A value of zero means that there is exactly 
one hint track reference, which is to be used. Values 
from 1 to 127 are indices into the hint track reference 
atom entries, indicating from which original media track 
the sample is to be read. A value of -1 means the hint 
track itself, i.e., the sample description from the same 
track as the hint sample currently being parsed is 
utilized. 



Reserved 



Four bytes that are set to zero. 



Length 



A 16-bit integer specifying the number of bytes in the 



sample to copy. 



Sample 

description 

index 



A 32-bit integer specifying the index into the media's 



sample description table. 
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Offset A 32-bit integer specifying the offset from the start of 

the sample from which to start copying. 

Additional data A variable length field containing data pointed to by hint 
track sample mode entries in the data table. 

Appendix D — Examp le hint track format for MPEG-2 Transport 

This section presents one example of a simple track format for streaming 
MPEG-2 transport from a QuickTime movie holding elementary streams. 

An MPEG-2 transport stream is associated with a multiplex of one or more 
elementary streams. For this reason, an MPEG-2 transport hint track describes 
how to construct such a multiplex from one or more media tracks. There is not 
necessarily a one to one relationship between media tracks and MPEG-2 transport 
hint tracks. Each hint track may contain references to the elementary streams it 
represents. In one example of the present invention, a QuickTime file might 
contain multiple such hint tracks to describe different multiplexes. 

Packet size is generally not an issue, since all MPEG-2 transport packets 
are 188 bytes in size. In one example of the present invention, each transport 
packet (in the MPEG-2 transport protocol) contains payload data from one media 
track. This allows for a relatively simple hint description for each transport 
packet. In one example of the present invention, each such hint describes which 
header data appears on each transport packet, and then points to the payload in the 
appropriate media track for the transport packet. For packets which do not 
correspond with a media track, such as PSI packets, the hint may describe 188 
bytes of header data, and any media track reference may be considered irrelevant. 
For packets which do correspond with a media track, the header data may account 
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for information such as transport headers, possible adaptation headers, and PES 
headers for transport packets that begin PES packets. 

Reference is made to the MPEG-2 transport hint track in the Sample 
Description Atom (of type 'stsd'). This atom includes a sample description table, 
and the entries in this table differ based on the media type. In one example of the 
present invention, hint tracks begin with the structure shown in Table 1. The 
additional data table may hold entries with the structure shown in Table 2: 

In one example of the present invention, if the hint track is an MPEG-2 
transport hint track, the data format in the hint track sample description entry will 
be *m2t' and the max packet size will always be 188. In such a description entry, 
the types shown below in Tables 5-7 may be found in the additional data table: 
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Table 5 - Addit 


Lonal Data Table Entries 


Entry length 


Data type 


Data description 


8 


0x00000000 


Indicates there are no more entries in the table 


9 


'otyp' 


Describes how offsets are described in the 
hints. The one byte of data has values 
described below in figure B.4. This entry is 
mandatory in the additional data table. 


9 


4 msns 5 


Describes the si7e of media samnle nnmhprc 

The one byte of data indicates how many bytes 
are used to specify media sample numbers. If 
this is not present, and media sample numbers 
are present in the sample data, the default 
value is 4 bytes. 


TF 


k msos' 


Describes the size of media sample offsets. 
The one byte of data indicates how many bytes 
are used to specify media sample offsets. If 
this is not present, and media sample offsets 
are present in the sample data, the default 
value is 4 bytes. 


9 


*fosz' 


Describes the size of file offsets. The one 
byte of data indicates how many bytes are used 
to specify file offsets within samples If this is 
not present, and file offsets are present in the 
sample data, the default value is 4 bytes. 


Variable 


4 tmap' 


Describes an abbreviated mapping of media 
tracks. Each 5 byte entry maps a 4 byte track 
ID to a 1 byte track reference number. This 
limits any given transport mux to containing 
no more than 256 media tracks, but this should 
not be a limiting factor, and this compression 
is useful in limiting the size of the hint track. 
The format of these 5 byte entries is specified 
below in figure B.5. This entry is mandatory 
in the additional data table. 



Table 6- 'otyp' Values In the Additional Data Table 



Value 


Description 


0 


Samples are described in terms of media samples 


1 


Samples are described in terms of file offsets 



Table 7 - Format of Entries in the 'tmap' Additional Data Entry 



Length 


Description 


4 


Original Track ID 


1 


Abbreviated track reference number used in samples 
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In one example of the present invention, each hint sample describes one 
transport packet. Each transport packet can be described as some amount of 
header data, followed by some amount of payload from one media track. Since 
MPEG-2 transport packets are relatively small, a large number of hint samples 
may be generated, and thus, these samples preferably should be as small as 
possible. Several entries in the additional data table above may be used to 
minimize the size of samples, but such factors may make some of the fields in the 
sample entries variable in size. 

If the 'otyp' entry in the data table has the value 0, indicating that payload 
data is described in terms of media samples, hint samples may be of the following 
form shown in Table 8: 



Table 


8 - Hint Sample Format Using Media Sample References 


Length 


Description 


1 


Track reference number of the media track holding the payload 
data for this packet. This can be mapped to a track ID using the 
'tmap' entry in the additional data table. If the hint specifies 188 
bytes of immediate data, this field is irrelevant. 


1 


The length of the immediate data for the packet. Note that this 
must be 188 or less, since transport packets are 188 bytes in 
length. 


Variable 


Bytes of immediate data to be used as the header for the transport 
packet. The number of bytes is described by the previous field. 


Variable 


The media sample number to use for the payload data. The 
default size of this field is 4 bytes, but may be modified by the 
presence of an 'msns' entry in the additional data table. 


Variable 


The media sample offset to use for the payload data. The default 
size of this field is 4 bytes, but may be modified by the presence 
of an 'msos' entry in the additional data table. 



In one example of the present invention, it is not necessary to indicate the 
length of the payload data for the packet since in MPEG-2, this length is equal to 
188 minus the size of the header data for the packet. 
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If the 'otyp' entry in the data table has the value 1, indicating that payload 
data is described in terms of file offsets, hint samples may be of the following 
form shown in Table 9: 



Table 9 



Length 


Description 


1 


Track reference number of the media track holding the payload 
data for this packet. This can be mapped to a track ID using the 
4 tmap' entry in the additional data table* If the hint specifies 188 
bytes of immediate data, this field is irrelevant* 


1 


The length of the immediate data for the packet. Note that this 
must be 188 or less since transport packets are 188 bytes in 
length. 


Variable 


Bytes of immediate data to be used as the header for the transport 
packet* The number of bytes is described by the previous field. 


Variable 


The file offset where the payload data is located. This offset is 
in the file where the data for the media track is located. The 
default size of this field is 4 bytes* but may be modified by the 
presence of an 'fosz' entry in the additional data table. 



In one example of the present invention, hint samples may describe their 
offsets in terms of media samples or in terms of file offsets. Each of these has 
advantages and disadvantages. If hint samples specify payload in terms of media 
samples, they may be more resilient to additional editing of the file containing the 
media track, but may require additional processing for delivery. If hint samples 
specify payload in terms of file offsets, the payload data can be accessed relatively 
quickly, but any editing of the file containing the media track may invalidate the 
hints. 



Appendix D — An example file 

Provided below is a relatively short (six frame) sample file, with some of 
the relatively less important fields and objects left out (marked here by ellipsis 
. .")> and with some fictitious numbers to illustrate the overall structure of a file 
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which is ready for streaming over RTP, according to one embodiment of the 
present invention. The media data has been left out; only the meta-data is shown. 



moov -- the entire movie meta-data 
mvhd overall movie information 

TIME- SCALE 600 

DURATION 2792 

PREFERRED-RATE 1 

VOLUME 255 

MATRIX [[10 0] [0 1 0] [0 0 1]] 

NEXT-TRACK- ID 5 — tracks 1 to 4 are here 

trak — this is the video track 
tkhd 

TRACK- ID 1 
DURATION 2792 
LAYER 0 

MATRIX [[10 0] [0 1 0] [0 0 1]] 

WIDTH 176 
HEIGHT 144 
mdia 
mdhd 

TIME- SCALE 600 
DURATION 2722 

hdlr we use the basic video media handler 

TYPE mhlr 

SUBTYPE vide 

MANUFACT appl 

NAME Apple Video Media Handler 
minf 
vmhd 

hdlr jbasic l alias' disk data handler gets the data 

TYPE dhlr 

SUBTYPE alis 

MANUFACT appl 

NAME Apple Alias Data Handler 
dinf 
dref 



ENTRY- COUNT 
REFS 



1 

[Pointer to this file] 
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stbl -- the complete sample table 
stsd the sample description (s) 

ENTRY- COUNT 1 

DESCRIPTIONS [video sample description] 

stts -- convert time to sample 



ENTRY- COUNT 
TIMETOSAMPLE 



6 

( (1 
(1 
(1 
(1 
(1 
(1 



200) 

251) 

479) 

531) 

1022) 

239) ) 



■- count, duration 



stss % sync' or key sample numbers 

ENTRY-COUNT 1 
SYNC SAMPLES (1) 
stsc — sample to chunk 



ENTRY- COUNT 1 
SAMPLETOCHUNK ( ( 1 1 1 ) ) 

1st chunk, samples /chunk, 
stsz — sample sizes 



desc . number 



different 



DEFSAMPLESIZE 

ENTRY-COUNT 
SAMPLESIZES 



no default size, all 



stco 



6 

(664 
616 
1176 
1304 
2508 
588) 

chunk offsets into file 



ENTRY-COUNT 
CHUNKOFFSETS 



6 

(4743 
5407 
8010 
12592 
17302 
25268) 



trak 
tkhd 



this is the sound track 



TRACK- ID 
DURATION 



2 

2792 



VOLUME 



nidi a 
mdhd 



TIME- SCALE 



8000 
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G3M] 



DURATION 372 80 

LANGUAGE US English 

hdlr — handled by the basic sound handler 

TYPE mhlr 

SUBTYPE soun 

MANUFACT appl 

NAME Apple Sound Media Handler 
minf 
smhd 

BALANCE 0 
hdlr data fetched by usual disc data handler 

TYPE dhlr 
SUBTYPE alis 
MANUFACT appl 

NAME Apple Alias Data Handler 

dinf 
dref 

ENTRY-COUNT 1 

REFS [Pointer to this file] 

stbl sample table for the sound 
stsd sample descriptions 

ENTRY-COUNT 1 

DESCRIPTIONS [Sound sample description, incl 

stts -- time to sample table 

... sound is measured by uncompressed samples 

ENTRY-COUNT 1 

TIMETOSAMPLE ((37280 1 ) ) 

stsc 

ENTRY- COUNT 2 
SAMPLETOCHUNK ((1 4000 1) 

(10 1280 1) ) 
first chunk, samples /chunk, desc . number 

stsz 

DEFSAMPLESIZE 1 -- all samples same size 

ENTRY- COUNT 37280 
stco — chunk offset table 

ENTRY -COUNT 10 
CHUNKOFFSETS (3093 

3918 

6023 

9186 

10915 

13896 ...) 
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trak -- the RTP hints for the video track 
tkhd 

TRACK- ID 3 
DURATION 2792 

tref 

hint references the video track 
TRACKIDS ( 1 ) 

mdia 
mdhd 

TIME- SCALE 600 
DURATION 2792 

hdlr — is *played' by the hint media handler 

TYPE mhlr 
SUBTYPE hint 
MANUFACT appl 

NAME hint media handler 

minf 
gmhd 



data 



hdlr -- if played, the regular disc handler would fetch 



TYPE dhlr 

SUBTYPE alis 

MANUFACT appl 

NAME Apple Alias Data Handler 
dinf 
dref 

ENTRY -COUNT 1 

REFS [Pointer to this file] 
stbl -- samples describe packets 
stsd 

ENTRY-COUNT 1 

DESCRIPTIONS [hint sample description] 
stts — one packet per frame for video 

ENTRY - COUNT 6 
TIMETOS AMPLE ( ( 1 270) 

(1 251) 

(1 479) 

(1 531) 

(1 1022) 

(1 239) ) 

stss -- key sample derive from video 



ENTRY -COUNT 



1 
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SYNC SAMPLES ( 1 ) 

stsc -- sample to chunk table 

ENTRY-COUNT 1 
SAMP LETOC HUNK ( ( 1 1 1 ) ) 

stsz sample sizes (packet instructions ) 



DEFSAMPLESIZE 
ENTRY-COUNT 
SAMPLES I ZES 



0 
6 

(52 
52 
52 
52 
102 
52) 



stco chunk offsets 



ENTRY-COUNT 
CHUNKOFFSETS 



6 

(6848 
6900 
10011 
14721 
20635 
25856) 

udta -- track is named for ease of identification 
name 

NAME Hinted Video Track 

trak -- the RTP hints for the sound track 
tkhd 



TRACK- ID 



tref — references the sound track 
hint 



TRACKIDS 

mdia 
mdhd 



(2) 



TIME- SCALE 
DURATION 



8000 
37120 



hdlr 



TYPE 

SUBTYPE 

MANUFACT 



NAME 
minf 
ginhd 



mhlr 
hint 
appl 

hint media handler 



hdlr 



TYPE 
SUBTYPE 



dhlr 
alis 
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MANUFACT 



NAME 
dinf 
dref 



appl 

Apple Alias Data Handler 



ENTRY- COUNT 
REFS 

stbl 
stsd 



[Pointer to this file] 



ENTRY- COUNT 
DESCRIPTIONS 
stts -- time to sample 

ENTRY- COUNT 
T I METO SAMPLE 



stsc 



[hint sample description] 



4 



(<1 960) 

(7 4000) 

(1 1120) 

(1 7040) ) 



ENTRY-COUNT 
SAMPLETOCHUNK 
stsz 



((1 1 1) ) 



DEFSAMPLESIZE 
ENTRY -COUNT 
SAMPLESIZES 



0 

10 

(206 
852 
852 
852 
852 
852 



stco 



ENTRY- COUNT 
CHUNKOFFSETS 



udta 



NAME 



10 

(6952 
7158 
10063 
11740 
14773 

16450 ...) 
Hinted Sound Track 
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CLAIMS 

What is claimed is: 

1 . A machine readable medium having stored therein: 

a set of data comprising at least one instruction to cause a digital processing 
system to generate a set of packets representing a time related 
sequence of media data, wherein said set of packets is associated 
with a transmission protocol, and wherein said set of data 
comprises a time related sequence of data associated with said time 
related sequence of media data. 

2 . The machine readable medium of claim 1 , wherein said set of data is 
associated with a track of indicating data, and wherein said transmission protocol 
comprises a packet data protocol. 

3 . The machine readable medium of claim 1 , wherein said set of data is 
further associated with a QuickTime file. 

4 . The machine readable medium of claim 1 , wherein said set of data is a time 
related sequence of data associated with and separate from of at least one of a 
sequence of audio data and video data of said time related sequence of media data. 

5 . The machine readable medium of claim 1, wherein said set of data indicates 
a timing parameter for transmitting said set of packets over a data communication 
medium. 
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6. The machine readable medium of claim 1, wherein said set of data includes 
instructions to provide multiplexing of information into said set of packets for 
transmission over a data communication medium. 

7 . The machine readable medium of claim 1, wherein said transmission 
protocol comprises an Internet protocol. 

8 . A machine readable medium having stored thereon: 

information to cause a digital processing system to transmit a first time 
related sequence associated with media data according to a 
transmission protocol, and wherein said information comprises a 
second time related sequence associated with and separate from 
said first time related sequence. 

9 . The machine readable storage medium of claim 8, wherein said information 
is associated with a track of indicating data, and wherein said transmission 
protocol comprises a packet data protocol. 

1 0. The machine readable medium of claim 8, wherein said information is 
associated with a QuickTime file. 

1 1 . The machine readable medium of claim 8, wherein said information 
indicates a timing parameter for transmitting said first time related sequence as a set 
of packets to be transmitted over a data communication medium. 
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12. The machine readable medium of claim 8, wherein said information 
includes instructions to provide multiplexing of said first time related sequence into 
a set of packets to be transmitted over a data communication medium. 

1 3 . The machine readable medium of claim 8, wherein said transmission 
protocol comprises an Internet protocol. 

14. The machine readable medium of claim 8, comprising a magnetic storage 
medium. 

1 5 . The machine readable medium of claim 8, comprising an electronic storage 
medium. 

16. The machine readable medium of claim 8, comprising an optical storage 
medium. 

17. A computer readable medium containing media data and information which 
indicates how to transmit said media data from a digital processing system, said 
computer readable medium comprising: 

a first time related sequence of data; 

a second time related sequence of data which indicates how to transmit said 
first time related sequence of media data in a data communication 
medium, wherein said second time related sequence of data is 
associated with said first time related sequence of data. 
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18. The computer readable medium of claim 17, wherein said second time 
related sequence of data is associated with a track of indicating data, and wherein 
said transmission protocol comprises a packet data protocol. 

19. The computer readable medium of claim 17, wherein said second time 
related sequence of data is associated with a QuickTime file. 

20. The computer readable medium of claim 17, wherein said second time 
related sequence of data indicates a timing parameter for transmitting said time 
related sequence of media data as a set of packets to be transmitted over a data 
communication medium. 

2 1 . The computer readable medium of claim 17, wherein said set of data 
includes instructions to provide multiplexing of said time related sequence into a 
set of packets to be transmitted over a data communication medium. 

22. The computer readable medium of claim 17, wherein said transmission 
protocol comprises an Internet protocol. 

23 . The computer readable medium of claim 17, comprising a magnetic storage 
medium. 

24. The computer readable medium of claim 17, comprising an electronic 
storage medium. 
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25 . The computer readable medium of claim 17, comprising an optical storage 
medium. 

26. A data signal embodied in a carrier wave comprising: 

a time related sequence of media data associated with a set of data to 

indicate a method to transmit said time related sequence of media 
data according to a transmission protocol, wherein said set of data 
is a time related sequence of data associated with and separate from 
said sequence of media data. 

27 . The data signal of claim 26, wherein said time related sequence of media 
data is further associated with data packets associated with said transmission 
protocol, which comprises a data packet protocol. 

28. The data signal of claim 26, wherein said set of data indicates a relative 
bandwidth associated with transmission of said time related sequence of media 
data. 
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300 



DETERMINE MEDIA FILE FORMAT 



I 



DETERMINE DESIRED TRANSMISSION PROTOCOL(S) 



3c3 



I 



CREATE AND STORE "HINTS" FOR PACKETIZING A TIME RELATED 
SEQUENCE OF MEDIA DATA IN A MEDIA FILE(S) (PACKETIZING IS 
FOR TRANSMISSION ACCORDING TO THE DESIRED TRANSMISSION 
PROTOCOL) (HINTS ARE NORMALLY STORED AS A TRACK OF TIME 
RELATED SEQUENCE OF HINTS WHICH REFER TO OTHER TRACKS OF 

MEDIA DATA) 



3oS 



I 



TRANSMIT DATA FROM A TRANSMITTING SYSTEM HAVING THE 
MEDIA FILE(S) AND HINTS TO A RECEIVING SYSTEM; DATA IS 
TRANSMITTED BY PACKETIZING THE MEDIA DATA ACCORDING TO 

THE HINTS 



3ol 



PRESENT, AT THE RECEIVING SYSTEM, THE MEDIA OBJECT 
REPRESENTED BY THE MEDIA DATA 



I 



OPTIONALLY REASSEMBLE MEDIA FILE (IN ORIGINAL MEDIA FILE 
FORMAT) FROM PACKETS OF MEDIA DATA RECEIVED AT THE 

RECEIVING SYSTEM 



3U 
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METHOD AND APPARATUS FOR MEDIA DATA TRANSMISSION 

FIELD OF THE INVENTION 

The present invention relates to methods and apparatuses for preparing 
time related sequences of media data for transmission, and more particularly to 
packetized transmission of such media data. 

INTRODUCTION AND BACKGROUND 

There are various different file structures used today to store time-based 
media: audio formats such as AIFF, video formats such as AVI, and streaming 
formats such as RealMedia. One reason that such file structures are different is 
their different focus and applicability. Some of these formats are sufficiently 
relatively widely accepted, broad in their application, and somewhat simple to 
implement, and thus, may be used not only for content delivery but also as 
interchange formats. Foremost among these general formats is the QuickTime file 
format. It is used today in the majority of web sites serving time-based data; in the 
majority of authoring environments, including professional ones; and on the 
majority of multimedia CDROM titles. 

The QuickTime media layer supports the efficient display and management 
of general multimedia data, with an emphasis on time-based material (video, 
audio, etc.). The media layer uses the QuickTime file format as the storage and 
interchange format for media information. The architectural capabilities of the 
layer are generally broader than the existing implementations, and the file format is 
capable of representing more information than is currently demanded by the 
existing QuickTime implementations. 
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In contrast to formats such as AVI, which were generally designed to 
support local random access of synchronized media, QuickTime allows systems to 
manage the data, relationships and timing of a general multimedia presentation. In 
particular, the QuickTime file format has structures to represent the temporal 
behavior of general time-based streams, a concept which covers the time-based 
emission of network packets, as well as the time-based local presentation of 
multimedia data. 

The existing QuickTime fde format is publicly described by Apple 
Computer in the May 1996 File format specification, which may be found at the 
QuickTime Web site, <http:/A www.apple.com/quicktime>. 

One aspect of the QuickTime file format is the concept that the physical 
structure of media data (the layout in disk records) is independent of, and 
described by, a logical structure for the file. The file is fully described by a set of 
"movie" meta-data. This meta-data provides declarative, structural and temporal 
information about the actual media data. 

The media data may be in the same file as the description data, (the 
"movie" meta-data), or in other file(s). A movie structured into one file is 
commonly called "flat", and is self-contained. Non-flat movies can be structured 
to reference some, or all, of the media data in other files. 

As such, the format is generally suited for optimization in different 
applications. For example, when editing (compositing), data need not be rewritten 
as edits are applied and media is re-ordered; the meta-data file may be extended and 
temporal mapping information adjusted. When edits are complete, the relevant 
media data and meta-data may be rewritten into a single, interleaved, and optimized 
file for local or network access. Both the structured and the optimized files are 
valid QuickTime files, and both may be inspected, played, and reworked. 
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The use of structured ("non-flat") files enables the same basic media data to 
be used and re-used in any number of presentations. This same advantage applies 
when serving, as will be seen below. 

In both editing and serving, this also permits a number of other files to be 
treated as part of a movie without copying the media data. Thus editing and 
serving may be done directly from files such as Sun Microsystem's "au" audio 
format or the AVI video format, greatly extending the utility of these formats. 

The QuickTime file is divided into a set of objects, called atoms. Each 

object starts with an atom header, which declares its size and type: 

class Atom { 

int(32) size; 
char type [ 4 ] ; 

byte contents [ ] ; 

} 

The size is in bytes, including the size and type header fields. The type 
field is four characters (usually printable), to permit easy documentation and 
identification. The data in an object after the type field may be fields, a sequence 
of contained objects, or both. 

A file therefore is simply a sequence of objects: 

class File { 
Atom[ ] ; 

} 

The two important top-level objects are the media-data (mdat) and the meta- 
data (moov). 

The media-data object(s) contain the actual media (for example, sequences 
of sound samples). Their format is not constrained by the file format; they are not 
usually objects. Their format is described in the meta-data, not by any declarations 
physically contiguous with them. So, for example, in a movie consisting solely of 
motion- JPEG, JPEG frames are stored contiguously in the media data with no 
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intervening extra headers. The media data within the media data objects is 
logically divided into chunks; however, there are no explicit chunk markers within 
the media data. 

When the QuickTime file references media data in other files, it is not 
required that these 'secondary* files be formatted according to the QuickTime 
specification, since such media data files may be formatted as if they were the 
contents of a media object. Since the QuickTime format does not necessarily 
require any headers or other information physically contiguous with the media 
data, it is possible for the media data to be files which contain 'foreign' headers 
(e.g. UNIX "au" files, or AVI files) and for the QuickTime meta-data to contain 
the appropriate declarative information and reference the media data in the 'foreign' 
file. In this way the QuickTime file format can be used to update, without 
copying, existing bodies of material in disparate formats. The QuickTime file 
format is both an established format and is able to work with, include, and thereby 
bring forward, other established formats. 

Free space (e.g. deleted by an editing operation) can also be described by 
an object. Software reading a file that includes free space objects should ignore 
such free space objects, as well as objects at any level which it does not 
understand. This permits extension of the file at virtually any level by introducing 
new objects. 

The primary meta-data is the movie object. A QuickTime file has exactly 
one movie object which is typically at the beginning or end of the file, to permit its 
easy location: 

class Movie { 

int(32) size; 

char type [ 4 ] = ' moov 1 ; 

MovieHeader mh; 

contents Atom [ ] ; 

} 
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The movie header provides basic information about the overall presentation 
(its creation date, overall timescale, and so on). In the sequence of contained 
objects there is typically at least one track, which describes temporally presented 
data. 

class Track { 

int (32) size; 

char type [ 4 ] = ' trak ' ; 

TrackHeader th ; 

contents Atom[]; 

} 

The track header provides relatively basic information about the track (its 
ID, timescale, and so on). Objects contained in the track might be references to 
other tracks (e.g. for complex compositing), or edit lists. In this sequence of 
contained objects there may be a media object, which describes the media which is 
presented when the track is played. 

The media object contains declarations relating to the presentation required 
by the track (e.g. that it is sampled audio, or MIDI, or orientation information for a 
3Dscene). The type of track is declared by its handler: 

class handler { 

int (32) size; 

char type [4] = *hdlr'; 

int (8) version; 

bit (24) flags; 

char handlertype [4] ; — mhlr for media handlers 

char handler sub type [4] ; -- vide for video, soun 

for audio 

char manufacturer [ 4 ] ; 

bi t ( 32 ) handlerf lags ; 

bit (32) handlerf lagsmask; 

string componentname ; 

} 

Within the media information there is likewise a handler declaration for the 
data handler (which fetches media data), and a data information declaration, which 
defines which files contain the media data for the associated track. By using this 
declaration, movies may be built which span several files. 
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At the lowest level, a sample table is used which relates the temporal aspect 

of the track to the data stored in the file: 

class sampletable { 

int(32) size; 
char type [4] = ' stbl'; 

sampledescription sd; 
timetosample tts; 
syncsampletable syncs ; 
sampletochunk stoc ; 

samplesize ssize; 
chunkoffset coffset; 
shadowsync ssync; 

} 

The sample description contains information about the media (e.g. the 
compression formats used in video). The time-to-sample table relates time in the 
track, to the sample (by index) which should be displayed at that time. The sync 
sample table declares which of these are sync (key) samples not dependent on 
other samples. 

The sample-to-chunk object declares how to find the media data for a given 

sample, and its description given its index: 

class sampletochunk { 

int(32) size; 

char type [4] = 'stsc 1 ; 

int(8) version; 

bits (24) flags; 

int ( 32 ) entrycount ; 

for (int i=0; i<entrycount ; i++) { 

int (32) f irstchunk; 

int (32) samplesper chunk; 

int (32) s amp ledescript ion index; 

} 

} 

The sample size table indicates the size of each sample. The chunkoffset 
table indicates the offset into the containing file of the start of each chunk. 

Walking the above-described structures to find the appropriate data to 
display for a given time is fairly straightforward, generally involving indexing and 
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adding. Using the sync table, it is also possible to back-up to the preceding sync 
sample, and roll forward 'silently' accumulating deltas to a desired starting point. 

Figure 1 shows the structure of a simple movie with one track. A similar 
diagram may be found in the QuickTime file format documentation, along with a 
detailed description of the fields of the various objects. QuickTime atoms (objects) 
are shown here with their type in a grey box, and a descriptive name above. This 
movie contains a single video track. The frames of video are in the same file, in a 
single chunk of data. It should be noted that the 'chunk* is a logical construct 
only; it is not an object. Inside the chunk are frames of video, typically stored in 
their native form. There are no required headers or fields in the video frames 
themselves. 

Figure 2 is a diagram of a self-contained file with both an audio and a 
video track. Fewer of the atoms are shown here, for brevity; the pointers from the 
tracks into the media data are, of course, the usual sample table declarations, 
which include timing information. 

The QuickTime file format has a number of advantages, including: 

1) Scalability for size and bit-rates. The meta data is flexible, yet compact. 
This makes it suitable for small downloadable movies (e.g. on the Internet) 
as well as providing the basis for a number of high-end editing systems, 
which may require the use of relatively large amounts of uncompressed 
data. 

2) Physical structure is independent of the logical and temporal structure. This 
makes it possible to optimize the physical structure differently depending on 
the use the file will have. In particular, it means that a single file format is 
suitable for authoring and editing; downloading or placing on CDROMs; 
and for streaming. 

3) The file format has proven capable of handling a very broad variety of 
codec types and track types, including many not known at the time the 
format was designed. This proven ability to evolve in an upwards- 
compatible fashion is fundamental to the success of a storage format. 
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Scalable, or layered, codecs can be handled in a number of ways in the 
QuickTime file format. For a streaming protocol which supports scalability, the 
samples may be tagged with the layer or bandwidth threshold to be met for 
transmitting the samples. 

Tracks which form a set of alternatives (e.g. different natural language 
sound tracks) can be tagged so that only one is selected for playback. The same 
structure can be used to select alternatives for streaming (e.g. for language 
selection). This capability is described in further detail in the QuickTime file 
format. 

When QuickTime displays a movie or track, the appropriate media handler 
accesses the media data for a particular time. The media handler must correctly 
interpret the data stream to retrieve the requested data. For example, with respect 
to video media, the media handler typically traverses several atoms to find the 
location and size of a sample for a given media time. The media handler may 
perform the following: 

1 . Determine the time in the media time coordinate system. 

2. Examine the time-to-sample atom to determine the sample number that 

contains the data for the specified time. 

3. Scan the sample-to-chunk atom to discover which chunk contains the 
sample in question. 

4. Extract the offset to the chunk from the chunk offset atom. 

5. Find the offset within the chunk and the sample's size by using the sample 
size atom. 

It is often desirable to transmit a QuickTime file or other types of time 
related sequences of media data over a data communication medium, which may be 
associated with a computer network (e.g. the Internet). In many computer 
networks, the data which is transmitted into the network should generally be in a 
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packet form. Normally, time related sequences of media data are not stored in the 
proper packetized format for transmission over a network. For example, media 
data files in the QuickTime format are not in a packetized format. Thus, there is a 
need to collect the data, sometimes referred to as streaming data, into packets for 
transmission over a network. 

One prior approach to address the problem of transmitting time related 
sequences of media data over a network is to send the media file over the network 
using a network or transmission protocol, such as the Hypertext Transfer Protocol 
(HTTP). Thus, the media file itself is sent from one computer system over the 
network to another computer system. However, there may be no desire to retain 
the media file at the receiving computing system. That is, when the media file is 
received and viewed or listened to at the receiving computer system, there may be 
no desire or capability by the user of that receiving computer system to store a 
copy of the file, for example, if the receiving computing system is a network 
computer or a computer with low storage capacity. 

Another alternative approach to solving the problem of how to collect data 
for transmission by packets over a network is to prepare a file which contains the 
network protocol data units in the file for a particular transmission protocol. In a 
sense, such a file may be considered a packetized file which is stored in essentially 
the same format as it will be transmitted according to the particular transmission 
protocol. Performing this operation generally involves storing the file in a 
packetized form for a particular network protocol at a particular data transmission 
rate and a particular media file format. Thus, for each different transmission 
protocol at a particular data transmission rate, the file will essentially be replicated 
in its packetized form. The fixed form of such files may restrict their 
applicability/compatibility and make it difficult to view such files locally. Thus, 
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such an approach may greatly increase storage requirements in attempting to 
provide the file in various transmission protocols at various different data 
transmission rates. Moreover, each packetized file generated according to this 
alternative prior approach is generally limited to a particular media file format, and 
thus, other media file formats for the same media object (e.g. a digital movie) are 
typically packetized and stored on the sending computer system. 

Yet another approach to solving the problem of how to stream time related 
sequences of media data is to perform the packetization of the media data when 
required on the transmitting system according to the particular transmission 
protocol which is desired. This processing requires, in many cases, a relatively 
considerable amount of time, and thus, may slow the performance of the 
transmitting system. 

Thus, it is desirable to provide an improved method and apparatus for 
transmitting time related sequences of media data. 

SUMMARY OF THE INVENTION 

The present invention relates to information which allows transmission of 
time related data over a data communication medium. In one embodiment, a set of 
data includes at least one instruction to cause a digital processing system to 
generate a set of packets representing a time related sequence of media data. The 
set of packets is associated with a transmission protocol. The set of data includes 
a time related sequence of data which is associated with the time related sequence 
of media data. 



WO 99/37056 



PCT/US99/00953 



-11- 

BRTEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows an example of the structure of a simple movie with one 
track in the prior art. 

Figure 2 is an example of a self-contained movie file of the prior art. 

Figure 3 is a flowchart showing one example of a method according to the 
present invention. 

Figure 4 shows an example of a hint track of the present invention. 

Figure 5 shows another example of a hint track of the present invention. 

Figure 6 is a diagram of a network of computer systems in which media 
data may be exchanged and/or processed, according to one embodiment of the 
present invention. 

Figure 7 is a block diagram of a digital processing system which may be 
used in accordance with one embodiment of the present invention. 

Figure 8 is a block diagram of a system that utilizes hints to transfer media 
data, according to one embodiment of the invention. 

Figure 9 is a block diagram of a system that utilizes hints to transfer media 
data, according to one embodiment of the invention. 

Figure 10 is a flow diagram illustrating a method for generating hints for 
providing media data transmission, according to one embodiment of the invention. 

Figure 1 1 is a flow diagram illustrating a method of processing media data 
received by a teceiving system in accordance with hints, according to one 
embodiment of the invention. 

Figure 12 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a generator, according to one 
embodiment of the invention. 
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Figure 13 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a server, according to one 
embodiment of the invention. 

Figure 14 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a receiving system or other digital 
processing system, according to one embodiment of the invention. 

Figure 15 is a diagram of a data storage and/or communication medium 
having stored/transported thereon media and hint information, according to one 
embodiment of the invention. 

DETAILED DESCRIPTION 

The present invention provides methods and apparatuses for allowing the 
transmission, and particularly the packetized transmission of time related 
sequences of media data, which may include, for example, video, audio, video 
and audio, etc., over a communication media, such as in a computer network. 

In one embodiment of the present invention, a digital processing system 
creates a set of data for indicating how to transmit a time related sequence of media 
data according to a transmission protocol. Typically, this set of data is stored on a 
storage device coupled to the digital processing system. Further, this set of data is 
a time related sequence of data associated with the time related sequence of media 
data. 

The present invention may be implemented entirely in executable computer 
program instructions which are stored on a computer readable media or may be 
implemented in a combination of software and hardware, or in certain 
embodiments, entirely in hardware. Typically, a server computer system coupled 
to a network will create the set of data, which may be referred to as a hint track and 
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will store this hint track in a storage device which is coupled to the server 
computer system. When a client computer system requests a presentation (e.g. a 
viewing or listening or viewing and listening) of a media data file, the server 
system uses the hint track to determine how to packetize the media data for 
transmission to the client computer system. It will be appreciated that the present 
invention is generally applicable to time related sequences of media data, and that 
QuickTime is represented herein as one example of this general applicability. 
Thus, the invention should not necessarily be limited to QuickTime. 

Figure 3 shows one example of a method according to the present 
invention. The method 300 shown in Figure 3 begins in step 301, in which the 
media file format for the particular media data which is desired to be transmitted is 
determined. In step 303, the particular transmission protocol or protocols which 
are desired to be used is also determined. However, steps 301 and 303 are 
optional, for example, in the case where the same media file format is always 
transmitted using the same transmission protocol. 

In step 305, a digital processing system, such as a server computer 
system, creates and stores the hints for packetizing a time related sequence of 
media data in a media file. Alternatively, one computer system may create the 
hints and provide them to another system, such as a server computer system, 
which stores them for later use in a transmission process. The packetization 
allows the transmission over a network or communication media according to the 
desired transmission protocol which was determined in step 303. In one 
embodiment of the present invention, the hints are stored as a track of time related 
sequence of hints which refers to, but which in one embodiment, is separate from 
other tracks of media data. The track of hints, in one embodiment of the present 
invention, may be stored separately from the media data to which it refers. As 
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such, the track of hints may be stored in a file which is distinct from another file 
containing the media data which is referred to by the track of hints, or the track of 
hints may be stored in a hint area in the file containing the media data which is 
separate and distinct from the data area containing the actual media data. In one 
embodiment of the invention, a hint track, or portion thereof, may be interpreted as 
executable instructions by the server, which executable instructions cause the 
server to packetize a time related sequence of data, which is typically, but not 
necessarily, time-based media data. In one embodiment of the present invention, 
the hints are stored on the storage device which is coupled to the transmitting 
digital processing system. 

In step 307, the data which is packetized according to the hints, is 
transmitted from a transmitting system, such as a server computer system, to a 
receiving system. This media data is transmitted by packetizing the media data 
according to the hints. In one alternative embodiment of the invention, the server 
computer system may decide not to use the hints and to send the media data by an 
alternative packetization process. 

In step 309, the receiving system presents the media object which is 
represented by the media data. Typically, this presentation (which may be a 
viewing and listening of a media object or merely a viewing or merely a listening 
of the media object) is performed as the packetized data is received at the receiving 
system. The packetized data may, in one embodiment of the present invention, but 
need not be, stored on the receiving system. Thus the presentation of the data is 
ephemeral in the sense that once the presentation is over, there is no local copy at 
the receiving system. In another embodiment, presentation of the media object 
may take place on the server system subsequent to creating hints for the media data 
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representing the media object. In one embodiment of the invention, the media data 
is not necessarily (re)formatted, copied, etc., for packetization according to hints. 

In step 311, the receiving system may optionally reassemble the media file 
if the media file as received has been stored on the receiving system. It will be 
appreciated that the various steps of the method shown in Figure 3 may be 
performed in a different order than the one shown and described above and/or 
some of the steps may be performed simultaneously. For example, in one 
embodiment, steps 309 and 31 1 are performed in parallel. 

A particular implementation with QuickTime according to one embodiment 
of the present invention will now be described. In one embodiment of the present 
invention, a presentation which can be both viewed locally to the file (e.g., at a 
server, generator, etc.), and streamed over a network within a QuickTime movie is 
provided. In general, the streaming server (or another system) should have 
information about the data units to stream, their composition and timing. Since 
such information is typically temporal it may be described in tracks. A server may 
perform packetization and determine protocol information, for example, by using 
the same indexing operations as would be used to view a presentation. 

The tracks which contain instructions for the servers are sometimes 
referred to as 'hint' tracks, since such tracks represent a set of data to direct the 
server in the process of forming and transmitting packets. The QuickTime file 
format supports streaming of media data over a network as well as local playback. 
The process of sending protocol data units is time-based, just like the display of 
time-based data, and is therefore suitably described by a time-based format. A 
QuickTime file or * movie' which supports streaming includes information about 
the data units to stream. This information is included in additional tracks of the file 
called "hint" tracks. 
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Hint tracks contain instructions for a streaming server (or other digital 
processing system) which assist in the formation of packets. These instructions 
may contain immediate data for the server to send (e.g. header information) or 
reference segments of the media data. In one embodiment of the present 
invention, instructions are encoded in the QuickTime file in the same way that 
editing or presentation information is encoded in a QuickTime file for local 
playback. Instead of editing or presentation information, information may be 
provided which may allow a server to packetize the media data in a manner suitable 
for streaming using a specific network transport. 

In one embodiment of the present invention, the same media data is used in 
a QuickTime file which contains hints, whether it is for local playback, or 
streaming over a number of different transport types. Separate 'hint' tracks for 
different transport types may be included within the same file and the media may 
play over all such transport types without making any additional copies of the 
media itself. In addition, existing media may be made streamable by the addition 
of appropriate hint tracks for specific transports. According to one aspect of the 
invention, media data itself need not be recast or reformatted. 

Therefore the samples in a hint track generally contain instructions to form 
packets. These instructions may contain immediate data for the server to send 
(e.g. header information) or reference segments of the media data in another track. 

In one embodiment of the present invention, a three-level design is utilized 
such that: 

1) The media data is represented as a set of network-independent tracks, 
which may be played, edited, and so on, as normal; 

2) There is a common declaration and base structure for server hint tracks; 
this common format is protocol independent, but contains the 
declarations of which protocol(s) are described in the server track(s); 
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3) There is a specific design of the server hint tracks for each protocol 
which may be transmitted; all these designs use the same basic 
structure. For example, there may be designs for RTP (for the 
Internet) and MPEG-2 transport (for broadcast), or for new standard or 
vendor-specific protocols. 

In one embodiment of the present invention, the resulting streams, sent by 
the servers under the direction of the hint tracks, are normal streams, and do not 
necessarily include a trace of QuickTime information. This embodiment of the 
invention does not require that QuickTime, or its structures or declaration style, 
necessarily be either in the data on the transmission medium (e.g. network cable) 
or in the decoding station. For example, a file using H.261 video and DVI audio, 
streamed under RTP, may result, in one embodiment of the present invention, in a 
packet stream which is fully compliant with the IETF specifications for packing 
those codings into RTP, 

In one embodiment of the invention, hint tracks are built and flagged so 
that when the presentation is viewed locally, the hint tracks are essentially ignored 
by a receiving system. 

In one embodiment, a time related sequence of media data, which may, for 
example, include video, audio, etc., may be packetized by a digital processing 
system, and then presented on the same digital processing system. Furthermore, 
packetization may be ephemeral, such that the time related sequence being 
presented, stored, read, etc., is also packetized "on the fly." In one embodiment, 
hints may refer to media data that has not been copied, formatted, etc.; for 
example, the media data to which hints refer may be stored in original format on a 
read-only memory, etc. 

In one embodiment, the same hinting routine that provides packetization 
also presents the media as packetization is performed. In alternative embodiments 
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of the invention, a packetized file of time related media data may be generated 
according to hint tracks and stored, for example, for later transmission. 

Figure 4 illustrates utilization of hint tracks for transporting media data, 
according to one embodiment of the invention. In Figure 4, a hint track 401 is 
shown for the media track 403. Each hint track sample, such as hint track sample 
405 — which describes how to form an RTP packet — may contain a header, and 
may reference some data from an associated media track — in this case, a video 
track 403. In the embodiment shown in Figure 4, the media data (the video 
frames) and the RTP hints have been interleaved so that the associated media file 
may be read relatively easily. In this example, each frame is shown as fitting into 
a single RTP packet. Of course, it is possible to split frames into several packets 
when needed. Conversely, multiple frames can, if desired, be placed in a single 
packet, which is commonly performed with audio data. 

As discussed above, the logical structure described above need not imply 
physical structure. The meta data may be cached in memory, and the hint track 
samples physically interleaved with the media samples to which they refer (as is 
shown in Figure 4). 

Alternatively, it is possible to write a new set of meta data and media data, 
containing the hint tracks, which references and augments the meta data and media 
data in an existing presentation. Figure 5 illustrates utilization of hint tracks to 
reference media data in a separate file, according to one embodiment of the 
invention. In Figure 5, two movie files 502 and 504 are shown, each with their 
own meta-data. The first, the movie file 502, includes a video track. The second, 
the movie file 504, contains both a video track and a hint track, but the meta-data 
declares that the media data for the video track is in the first movie 502. Thus the 
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hints associated with the movie file 504 also point to the media data in the first 
movie 502. 

In one embodiment of the present invention, a media file may contain 
packetization hint tracks for multiple protocols. As such, each track may contain 
declarations of the protocol (and protocol parameters, if appropriate) for which the 
hint track is appropriate. These tracks may all, of course, reference media data 
from the basic media tracks in the file. The desire for protocol independence and 
extensibility may be met in the described manner. 

In one embodiment of the present invention, hint tracks need not use all the 
data in the media tracks. The hint tracks may use a subset of the data (e.g. by 
omitting some video frames) to reach a bandwidth threshold, or for other reasons. 
Since multiple hint tracks may be provided for the same protocol, differing subsets 
of the same basic media information at different rates may be provided. As such, 
the present invention may provide improved scalability over prior methods and 
apparatuses. 

It should be emphasized that though the hint tracks themselves, and the 
QuickTime meta-data, should, in one embodiment, be in QuickTime files, the base 
media can be left in any file type which QuickTime can import and reference in 
place. In one embodiment of the present invention, the meta-data in the movie file 
may include a data reference which declares that the media data is in another file. 
The sample table offsets and pointers may thus refer to data in this 'foreign' file. 
Thus, according to one embodiment of the present invention, existing legacy 
formats such as "au" audio files, "AVT audio/video files, and MIDI files, may be 
streamed without requiring the copying or reformatting of the base media data. 
Since the base media data is not written to, but merely augmented by QuickTime 



WO 99/37056 



PCT/US99/00953 



-20- 

declarations and hint information in separate files, the base media data may also be 
provided on read-only machine readable media such as CDROM. 

In one embodiment of the present invention, the hint tracks embody the 
results of off-line computation and are typically optimized to provide the server 
with information to support packetization, and if needed, multiplexing. 

Example hints, for example, for RTP (the IETF standard real-time 
protocol) and MPEG-2 transport are shown in Appendixes A-C. 

In one embodiment of the present invention, a single file may support hint 
tracks for multiple protocols, or multiple different parameterizations of the same 
protocols, without undue space overhead. New protocols, and their associated 
hint tracks, may be designed without disrupting systems relying on existing 
protocols. Thus the invention, at least in one embodiment, is protocol-neutral. 

In the QuickTime file format, a track may be added to the movie by 
updating or copying and augmenting the meta-data. If the media data is in files 
separate from the meta-data, or optimized interleave is not required, this can be a 
relatively simple and efficient operation. 

In one embodiment of the present invention, tracks may be extracted by 
building a new set of movie meta-data which contains only one track, and which 
can, if desired, reference the media data in the original. 

For example, in one embodiment of the present invention, a new audio 
track may be added which is marked as being an alternative to a set of other audio 
tracks. If it is also marked with the language code (e.g. French, or Tagalog), then 
the appropriate track may be selected at presentation time. 

SMPTE time-code tracks are an example of elementary streams which may 
be present, added, or removed, as need arises, according to one embodiment of 
the invention. 
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According to one aspect of the invention, hint tracks may permit the 
development of new formats for new protocols without causing compatibility 
issues for existing servers or local playback. In addition, new media tracks may 
be added over the life of the file format while maintaining backwards 
compatibility. 

In one embodiment of the present invention, the areas of extensibility 
include: 

a) New track types which can be defined for media types not covered by the 
current QuickTime file format (e.g. laboratory instrument readings). 

b) New coding types for existing tracks which may be defined (e.g. video or 
audio codecs). There is explicit provision for their codec-specific initialization 
information. 

c) New hint track types which may be defined for new protocols, and a file 
which may contain hint information for more than one protocol without 
incurring a space overhead for the media data itself. 

Existing content on read-only media may be used with the present 
invention (e.g., prepackaged movies on CD ROM, DVD, etc.). 

Furthermore, according to one aspect of the invention, various "foreign" 
file formats may be used. In one embodiment of the present invention, for 
example, if the existing content is either in QuickTime format, or can be imported, 
it may be edited and streamed without requiring copying or re-formatting. 

In one embodiment of the present invention, if a codec supports striping of 
the media data to achieve scalability of bandwidths, then these striped bandwidths 
may be represented using multiple stream tracks. Each track may represent a 
different bandwidth. Tracks may be grouped together in selected subsets of the 
basic media. 

In one embodiment of the present invention, if a protocol supports 
bandwidth scalability, then the hint track itself may contain information for each 
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protocol data unit (sample in the hint track). Information may include the 
bandwidth threshold above which the protocol data unit should be delivered to the 
network. Thus, hint tracks may indicate an available bandwidth as being high, 
low, etc., and/or other information relating to bandwidth for data transmission. 

In one embodiment of the present invention, if the protocol is a 
multiplexing protocol (e.g. MPEG-2 transport) then different hint tracks may be 
built which use a different subset of the elementary stream tracks to achieve 
different data-rates. Hence, some tracks may be omitted entirely for low bit-rate 
transmission. 

In one embodiment of the present invention, if it is desired to record the 
base data using different codecs, then those tracks may be formed into a group of 
alternatives, and only one selected for presentation. The selection of which track 
to use for presentation is typically protocol-dependent and may be achieved by 
using the hint track approaches described herein. 

In one embodiment of the present invention, encryption may also be pre- 
applied to a media file. In this case, the encrypted data may be stored in either (a) 
a new elementary stream (a new track) which is linked to the original media data 
(or the original media data may be removed if it is no longer needed) or (b) the hint 
track itself. In case (b), it is possible that the hint track does not extract any data 
from the elementary un-encrypted stream on the fly. Thus, all of the media data 
may be in the hint track as well as the streaming packet protocol data unit 
information, because the media data may be transformed by encryption. 

As an example of embedded object content information, the IETF session 
description information for a whole movie, and for individual tracks, may be 
stored in the meta-data for the RTP hint tracks, as user atoms. 
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In one embodiment of the present invention, a file format typically contains 
both media data in a playable format, and streaming information. In one 
embodiment, it is possible to stream directly from this format with relatively low 
overhead, while preserving the media independence, protocol independence, and 
ability to present the media locally. 

According to one aspect of the invention, hint tracks may abstract detailed 
knowledge of codecs, timing and packetization, into an off-line preparation 
process. Thus, following the hint tracks to generate the data stream may be 
relatively simple and require no specialized knowledge of the media being 
streamed. Thus, decoupling of a server, for example, from the details of the data 
content may be provided, according to one aspect of the invention. 

In one embodiment of the present invention, a set of hint tracks may be 
used to construct a file which is directly optimized for streaming — for example, by 
laying out network PDUs on disk at logical disk boundaries, in the time sequence 
in which they should sent. Such a file may no longer be a general presentation, 
but may be streamed. In one embodiment, packetized files created with hint tracks 
may be stored and, for example, later optimized for streaming. 

In one embodiment of the present invention, by encapsulating foreign file 
formats, media data may be retained in other formats while still be published in 
QuickTime. For example, an existing format may be directly encapsulated into a 
new media data file by applying the proper wrapper, or may be left intact and 
referred to in segments or as a whole by the hint track, allowing the legacy formats 
to be streamed without copying. A single movie may contain pieces selected from 
multiple legacy formats. This invention does not constrain the base media format. 

In general, a common format which spans capture, authoring and editing, 
download and streaming, will generally provide flexibility. Material may be 



WO 99/37056 



PCTYUS99/00953 



-24- 

reworked after use, or used in multiple ways, without being copied or re- 
formatted. In one embodiment of the present invention, it is possible to re-work 
and re-use material which has been hinted, by stripping the hint tracks, using 
standard editors, and then re-hinting after editing is completed. 

If it is desired that a media file be downloaded for local viewing, an 
optimized interleaved file may be built for that purpose, with the streaming meta- 
data in a separate declaration file referencing the same base media data. The 
download may not, therefore, include the streaming information, and yet the media 
data may be present only once at a streaming server. 

By separating logical structure from physical structure, the physical 
structure of the file may be optimized differently depending on the application (e.g. 
editing, local viewing, streaming). 

By permitting the existence of multiple hint tracks for each media track, in 
one embodiment of the present invention, the file may be published by streaming 
over multiple protocols, without requiring multiple copies of the media. 

Figure 6 is a diagram of a network of computer systems in which media 
data may be processed, according to one embodiment of the present invention. As 
shown in Figure 6, a number of client computer systems, one or more of which 
may represent one implementation of the receiving system described above with 
reference to Figure 3, are coupled together through an Internet 622. It will be 
appreciated that the term "Internet" refers to a network of networks. Such 
networks may use a variety of protocols for exchange of information, such as 
TCP/IP, ATM, SNA, SDI, etc. The physical connections of the Internet and the 
protocols and communication procedures of the Internet are well known to those in 
the art. Access to the Internet 103 is typically provided by Internet service 
providers (ISPs), such as the ISP 624 and the ISP 626. Users on client systems, 
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such as the client computer systems 602, 604, 618, and 620, generally obtain 
access to the Internet through Internet service providers, such as ISPs 624 and 
626. Access to the Internet may facilitate transfer of information (e.g., email, text 
files, media files, etc.) between two or more digital processing systems, such as 
the client computer systems 602, 604, 618, and 620 and/or a Web server system 
628. For example, one or more of the client computer systems 602, 604, 618, 
and 620 and/or the Web server 628 may provide media data (e.g., video and 
audio, or video, or audio) to another one or more of the client computer systems 
602, 604, 618, and 620 and/or the Web server 628. Such may be provided in 
response to a request. As described herein, such media data may be transferred in 
the system 600 according hints. Such hints, in one embodiment of the invention, 
may be created according to a specific format of the media data and/or a specific 
data communication (e.g., network) protocol(s). 

The Web server 628 is typically comprised of at least one computer system 
to operate with one or more data communication protocols, such as the protocols 
of the World Wide Web, and as such, is typically coupled to the Internet 622. 
Optionally, the Web server 628 may be part of an ISP which may provide access 
to the Internet and/or other network for client computer systems. The client 
computer systems 602, 604, 618, and 620 may each, with appropriate web 
browsing software, access data, such as HTML documents (e.g., Web pages), 
which may be provided by the Web server 628. Such data may provide media, 
such as QuickTime movies, which may be presented by the client computer 
systems 602, 604, 618, and 620. 

The ISP 624 provides Internet connectivity to the client computer system 
602 via a modem interface 606, which may be considered as part of the client 
computer system 602. The client computer system may be a conventional 
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computer system, such as a Macintosh computer, a "network" computer, a 
handheld/portable computer, a Web TV system, or other types of digital 
processing systems (e.g., a cellular telephone having digital processing 
capabilities). Similarly, the ISP 626 provides Internet connectivity for the client 
computer systems 604, 618 and 620, although as depicted in Figure 6, such 
connectivity may vary between various client computer systems, such as the client 
computer systems 602, 604, 618, and 620. For example, as shown in Figure 6, 
the client computer system 604 is coupled to the ISP 626 through a modem 
interface 608, while the client computer systems 618 and 620 are part of a Local 
Area Network (LAN). The interfaces 606 and 608, shown as modems 606 and 
608, respectively, in Figure 6, may be an analog modem, an ISDN modem, a 
cable modem, a satellite transmission interface (e.g., "Direct PC), a wireless 
interface, or other interface for coupling a digital processing system, such as a 
client computer system, to another digital processing system. The client computer 
systems 618 and 620 are coupled to a LAN bus 612 through network interfaces 
614 and 616, respectively. The network interfaces 614 and 616 may be an 
Ethernet-type, Asynchronous Transfer Mode (ATM), or other type of network 
interface. The LAN bus is also coupled to a gateway digital processing system 
610, which may provide firewall and other Internet-related services for a LAN. 
The gateway digital processing system 610, in turn, is coupled to the ISP 626 to 
provide Internet connectivity to the client computer systems 618 and 620. The 
gateway digital processing system 610 may, for example, include a conventional 
server computer system. Similarly, the Web server 628 may, for example, include 
a conventional server computer system. 

The system 600 may allow one or more of the client computer systems 
602, 604, 618, and 620 and/or the Web server 628 to provide media data (e.g., 
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video and audio, or video, or audio) to another one or more of the client computer 
systems 602, 604, 618, and 620 and/or the Web server 628. Such data may be 
provided, for example, in response to a request by a receiving system, which may 
be, for example, one or more of the client computer systems 602, 604, 618, and 
620. As described herein, such media data may be transferred in the system 600 
according hints or hint tracks. Such hints, in one embodiment of the invention, 
may be created according to a specific format of the media data and/or a specific 
data communication (e.g., network) protocol(s) to allow, according to one aspect 
of the invention, packetization of media data. 

Figure 7 is a block diagram of a digital processing system which may be 
used in accordance with one embodiment of the present invention. For example, 
the digital processing system 650 shown in Figure 7 may be used as a client 
computer system, a Web server system, a conventional server system, etc. 
Furthermore, the digital processing system 650 may be used to perform one or 
more functions of an Internet service provider, such as the ISP 624 or 626. The 
digital processing system 650 may be interfaced to external systems through a 
modem or network interface 668. It will be appreciated that the modem or 
network interface 668 may be considered as part of the digital processing system 
650. The modem or network interface 668 may be an analog modem, an ISDN 
modem, a cable modem, a token ring interface, a satellite transmission interface, a 
wireless interface, or other interface(s) for providing a data communication link 
between two or more digital processing systems. 

The digital processing system 650 includes a processor 652, which may 
represent one or more processors and may include one or more conventional types 
of such processors, such as a Motorola PowerPC processor, an Intel Pentium (or 
x86) processor, etc. A memory 155 is coupled to the processor 652 by a bus 656. 
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The memory 155 may be a dynamic random access memory (DRAM) and/or may 
include static RAM (SRAM). The processor may also be coupled to other types of 
storage areas/memories (e.g., cache, Flash memory, disk, etc.), which could be 
considered as part of the memory 155 or separate from the memory 155. 

The bus 656 further couples the processor 652 to a display controller 658, 
a mass memory 662, the modem or network interface 668, and an input/output 
(I/O) controller 664. The mass memory 662 may represent a magnetic, optical, 
magneto-optical, tape, and/or other type of machine-readable medium/device for 
storing information. For example, the mass memory 662 may represent a hard 
disk, a read-only or writeable optical CD, etc. The display controller 658 controls 
in a conventional manner a display 660, which may represent a cathode ray tube 
(CRT) display, a liquid crystal display (LCD), a plasma display, or other type of 
display device. The I/O controller 664 controls I/O device(s) 666, which may 
include one or more keyboards, mouse/trackball or other pointing devices, 
magnetic and/or optical disk drives, printers, scanners, digital cameras, 
microphones, etc. 

It will be appreciated that the digital processing system 650 represents only 
one example of a system, which may have many different configurations and 
architectures, and which may be employed with the present invention. For 
example, Macintosh and Intel systems often have multiple busses, such as a 
peripheral bus, a dedicated cache bus, etc. On the other hand, a network 
computer, which may be used as a digital processing device of the present 
invention, may not include, for example, a hard disk or other mass storage device, 
but may receive routines and/or data from a network connection, such as the 
modem or interface 668, to be processed by the processor 652. Similarly, a Web 
TV system, which is known in the art, may be considered to be a digital 
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processing system of the present invention, but such a system may not include one 
or more I/O devices, such as those described above with reference to I/O device(s) 
666. Additionally, a portable communication and data processing system, which 
may employ a cellular telephone and/or paging capabilities, may be considered a 
digital processing system which may be used with the present invention. 

In the system 650 shown in Figure 7, the mass memory 662 (and/or the 
memory 654) may store media (e.g., video, audio, movies, etc.) which may be 
processed according the present invention (e.g., by way of hints). Alternatively, 
media data may be received by the digital processing system 650, for example, via 
the modem or network interface 668, and stored and/or presented by the display 
660 and/or I/O device(s) 666. In one embodiment, packetized media data may be 
transmitted across a data communication network, such as a LAN and/or the 
Internet, in accordance with hint tracks. On the other hand, the processor 652 may 
execute one or more routines to use a file with one or more hint tracks, or 
alternatively, to create one or more hint tracks, to process media (e.g., a pre- 
packaged movie, audio file, video file, etc.) for presentation or packetization 
according to the hint tracks. Such routines may be stored in the mass memory 
662, the memory 664, and/or another machine-readable medium accessible by the 
digital processing system 650. In one embodiment, the digital processing system 
650 may process media data having hint tracks embedded therein. Similarly, such 
embedded media data may be stored in the mass memory 662, the memory 664, 
and/or another machine-readable medium accessible by the digital processing 
system 650. 

Figure 8 is a block diagram of a system that utilizes hints to transfer media 
data, according to one embodiment of the invention. The system 680 shown in 
Figure 8 includes a receiving system, which is depicted as a client data processing 
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system 682 coupled to a server 694, via a data communication link 686. The 
server 694 and/or client data processing system may, for example, represent one 
or a combination of the devices/systems described with reference to Figures 6 and 
7. 

The server 694 includes a hint generation and processing unit 688, a media 
processing unit 690, and a data communication unit 692, each of which may 
include hard- wired circuitry or machine-executable instructions or a combination 
thereof. Furthermore, at least a portion of such hard-wired circuitry and/or 
machine-executable instructions may be shared between a combination of the hint 
generation and processing unit 688, the media processing unit 690, and the data 
communication unit 692. In one embodiment, at least one storage area/memory 
(e.g., a machine-readable medium) having appropriate routines and/or data stored 
therein coupled to at least one processor is utilized, at least in part, to implement 
one or a combination of the hint generation and processing unit 688, the media 
processing unit 690, and the data communication unit 692. 

In one embodiment, the hint generation and processing unit 688 creates 
and stores hints for packetization of media data processed by the media processing 
unit 690. As described above, the hints may be generated and stored as a separate 
file, relative to media files or may be embedded with media data. If more than one 
media format is to be processed, an appropriate format may be taken into 
consideration by the hint generation and processing unit 688 to generate the hints. 
Information about the media format may be provided by the media processing unit 
690, which may also provide the media data (e.g., media files of video, audio, or 
video and audio, etc.). Similarly, the data communication unit 692 may provide 
one or more data communication (e.g., network) protocols for exchange of such 
media data, packetized according to the hints, via the data communication link 686. 
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As such, the hint generation and processing unit may determine, based on media 
format information provided by the media processing unit 690 and data 
communication protocol information provided by the data communication unit 
692, appropriate hints and packetization of media and/or the hints for transfer to a 
receiving digital processing system, such as the client data processing system 682. 
In one embodiment, the streaming of the media and hints is done in accordance 
with the QuickTime format. 

In response to media data and hint packets received via the data 
communication link 686, the client data processing system 682 may present a 
media object represented by the media data. Such presentation may be performed 
ephemerally, as described above. In one embodiment of the invention, the media 
data may optionally be stored by the client data processing system 682 and 
reassembled, for example, at a later time, for presentation and/or transmission by 
the client data processing system 682. 

Figure 9 is a block diagram of a system that utilizes hints to transfer media 
data, according to one embodiment of the invention. In particular, Figure 9 
depicts an embodiment of the invention wherein a separate digital processing 
system, referred to as a generator, may generate hints (or hint tracks) to provide to 
another system, such a server, that uses the hints to packetize media data for 
transfer to another system, such as a client computer system. A system 696 is 
shown in Figure 9, which includes a server 700 which may exchange data, via the 
data communication link 686, with the client data processing system 682. 
However, in the embodiment shown in Figure 9, the server 700 does not generate 
the hints. Rather, a generator 7 10, coupled to the server 700 by a data 
communication link 708, includes a hint generation unit 712 to generate hints that 
are used to packetize media data. 
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In one embodiment, the operation of the system 696 is as follows: the 
server 700 makes a request to the generator 710 to generate hints for one or more 
media files containing media data. For example, the media files may be stored in 
the server 700 on a machine-readable medium. The request may include 
information to indicate the format of the media file and/or a data communication 
protocol for transmission of the media data and/or other data. The data 
communication protocol may be related to the data communication link 686, which 
may, in one embodiment of the invention, be associated with a network connection 
having particular physical and logical characteristics to facilitate exchange of media 
and/or other data between the server 700 and the client data processing system 
682. In response to the request, the hint generation unit 712 generates appropriate 
hints, which may be associated with a time-related hint track, and provides the 
hints to the server 700. In response to the hints received from the generator 710, 
via the data communication link 708, the server 700, and in particular, a hint 
processing unit 702 uses the hints to packetize the media data for transmission to 
the client data processing system 682. 

In response to media data and hint packets received via the data 
communication link 686, the client data processing system 682 may present a 
media object represented by the media data. Such presentation may be performed 
ephemerally, as described above. In one embodiment of the invention, the media 
data may optionally be stored by the client data processing system 682 and 
reassembled, for example, at a later time, for presentation and/or transmission by 
the client data processing system 682. 

Figure 10 is a flow diagram illustrating a method for generating hints for 
providing media data transmission, according to one embodiment of the invention. 
In step 720, a media format is determined for media data to be transmitted, if more 
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than one format will be used. If only one format is used, 720 may not be 
performed. In step 722, an appropriate data communication protocol(s) is 
determined, again, assuming that more than one (protocol) may be used. In step 
724, based on the media format and the data communication protocol(s) (one or 
both of which may have been selected/configured), hints (e.g., hint tracks) related 
to media data transmission are created and stored. 

In step 726, which is optional, the hints may be transmitted to another 
digital processing system. In one embodiment of the invention, for example, the 
method of Figure 10, at least in part, may be performed exclusively by one digital 
processing system (e.g., a server). In an alternative embodiment, the method of 
Figure 10, at least in part, may be performed by two or more digital processing 
systems. For example, attributes of media data may be provided by a server or 
other system to another digital processing system, such as a generator. In 
response, the generator may determine, based on the attributes, an appropriate 
media format, data communication protocol(s), and hints for packetization of 
media data, which may be stored at the server. Alternatively, the server may 
provide the appropriate media format and protocol(s) to the generator, which could 
then generate hints. The generator may transmit the hints to the server or other 
digital processing system, which could packetize media data according to the hints. 

Figure 1 1 is a flow diagram illustrating a method of processing media data 
received by a receiving system in accordance with hints, according to one 
embodiment of the invention. In step 730, media data transmitted according to a 
receiving system in accordance with hints or hint tracks is received by the 
receiving system. In one embodiment, the receiving system may receive 
packetized media data, as well as packetized hint tracks. The hint tracks, in one 
embodiment of the invention, may be associated with at least portions of the media 
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data. Such data may be received by the receiving system in response to a request 
that may be made by the receiving system. For example, in one embodiment, the 
receiving system may be a client computer system and the request may be made to 
a server or other digital processing system for the media data. In response, the 
server may generate (or have generated for it by a separate digital processing 
system) hints for packetizing the media data, and transmit the packetized media 
data, which may include hints, to the receiving system. 

In step 732, a media object represented by the media data received by the 
receiving system is presented by the receiving system. For example, the media 
data may include video, audio, or combination thereof that is "presented" by the 
receiving system, for example, on a display and speaker(s). As mentioned above, 
the media data may be associated with a QuickTime movie. 

Optionally, in step 734, the media data, which may include hints, may be 
stored by the receiving system as a media file(s). Thus, in alternative 
embodiments of the invention, step 732 may not be performed as the media data is 
received, or may be performed before, after, or in parallel with step 734. 

In step 734, the stored media file may optionally be reassembled and/or 
presented. As such, step 732 may be performed subsequent to step 734. 

Figure 12 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a generator, according to one 
embodiment of the invention. It will be appreciated that the actual memory that 
stores the elements shown in and described below with reference to Figure 12 may 
be one or several elements, such as one or more disks (which may, for example, 
be magnetic, optical, magneto-optical, etc.), the memory 654 and/or the mass 
memory 662 described above with reference to Figure 7. Furthermore, in one 
embodiment where the generator, with which the machine readable storage 
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medium shown in Figure 12 is associated, is a network computer, one or more of 
the elements of the machine readable storage medium may be stored at another 
digital processing system and downloaded to the generator. Furthermore, the 
elements described with reference to the machine readable storage medium may, at 
some point in time, be stored in a non-volatile mass memory (e.g., a hard disk). 
Conversely, at other times, the elements of the machine storage medium may be 
dispersed between different storage areas, such as DRAM, SRAM, disk, etc. 

Figure 12 shows a machine readable storage medium 740. In one 
embodiment, the machine readable storage medium is utilized, at least in part, by a 
digital processing system that generates hints or hint tracks, i.e., a generator, in 
accordance with one or more method(s) of the invention. The generator, as 
described with reference to Figure 8, may be integrated into a digital processing 
system that transmits media data according to the hint tracks, or may be, as 
described with reference to Figure 9, a digital processing system that creates and 
provides the hints to another digital processing system, such as a server, which 
utilizes the hints to packetize and transmit media data. 

As shown in Figure 12, the machine readable storage medium 740 
typically includes a number of elements. For example, the machine readable 
storage medium 740 includes software for providing operating system 
functionality to the generator, as depicted by a generator operating system (OS) 
742. A network transmission routine(s) 748 provides data communication 
functionality, such as routines, protocols, etc., to allow the generator to transmit 
and receive data via a data communication link. 

In addition, the machine readable storage medium 740 includes routines 
and data for creating hints associated with media transmission. As such, the 
machine readable storage medium 740 may optionally include information 750, 
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which may provide information relating to one or more data communication 
protocols and media formats which may be necessary for creation of hints by a 
hint creation routine(s) 744. For example, the information 750 may include 
information relating to QuickTime movies, RTP, MPEG, etc. However, such 
information may, at least in part, be integrated into the hint creation routine 744 
and/or be provided to the generator by a remote digital processing system. 

The hints created by the hint creation routine(s) 744 may be stored as 
created hints 746 and/or stored/transmitted elsewhere (e.g., at a remote digital 
processing device, which may be a server). The hints are hint tracks that are time- 
related for packetization and transmission of media data, which is also time-related 
(e.g., video, audio, video and audio, etc.). 

Although the machine readable storage medium 740 is described with 
reference to a generator, the medium 740, at least in part, may be part of a number 
of types of digital processing systems, data storage media, etc. For example, the 
machine readable storage medium 740, at least in part, may be included as part of a 
server or other digital processing system. Furthermore, the machine readable 
storage medium 740, at least in part, may be included as part of a software utility 
on one or more disks or other machine readable media. 

Figure 13 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a server, according to one 
embodiment of the invention. It will be appreciated that the actual memory that 
stores the elements shown in and described below with reference to Figure 13 may 
be one or several elements, such as one or more disks (which may, for example be 
magnetic, optical, magneto-optical, etc.), the memory 654 and/or the mass 
memory 662 described above with reference to Figure 7. Furthermore, in one 
embodiment where the server, with which the machine readable storage medium 
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shown in Figure 13 is associated, is a network computer, one or more of the 
elements of the machine readable storage medium may be stored at another digital 
processing system and downloaded to the server. Furthermore, the elements 
described with reference to the machine readable storage medium may, at some 
point in time, be stored in a non- volatile mass memory (e.g., a hard disk). 
Conversely, at other times, the elements of the machine storage medium may be 
dispersed between different storage areas, such as DRAM, SRAM, disk, etc. 

Figure 13 shows a machine readable storage medium 760. In one 
embodiment, the machine readable storage medium is utilized, at least in part, to 
packetize media data for transmission on a data communication link in accordance 
with one or more method(s) of the invention. The machine readable storage 
medium 760 may be associated with a server, such as the server 694 described 
with reference to Figure 8, to include routines to create hint tracks and transmit 
media data according to the hint tracks. In another embodiment, the machine 
readable storage medium 760 may be associated with a digital processing system, 
such as the server 700 described with reference to Figure 9, wherein a digital 
processing system, such a generator, includes routines to create hints, and the 
server, using the hints as processed by routines provided by the machine readable 
storage medium 760, may packetize and transmit media data. 

The machine readable storage medium 760 includes a number of elements. 
For example, the machine readable storage medium 760 includes software for 
providing operating system functionality to the server, as depicted by a server 
operating system (OS) 762. A network transmission routine(s) 768 provides data 
communication functionality, such as routines, protocols, etc., to allow the server 
to transmit and receive data via a data communication link. 
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In addition, the machine readable storage medium 760 includes a media 
packetization routine 770 for packetizing media data, which may be time-related, 
based on hints, and which may also be packetized. Accordingly, the machine 
readable storage medium 760 includes a media data storage area 764 and a hint 
storage area 766 to store media data (which may, for example, be QuickTime 
movies or other media tracks) and hints (e.g., hint tracks), respectively. The hints 
may include hint tracks that are time-related for packetization and transmission of 
media data, which is also typically time-related (e.g., video, audio, video and 
audio). In one embodiment, the hint tracks are packetized separately from the 
media data packets. In one embodiment, hints include pointer information 
identifying media data (e.g., a particular packet(s)) which may be in a separate 
media file. 

Figure 14 is an example of a machine readable storage medium that may be 
accessed by a digital processing system, such as a receiving system or other digital 
processing system, according to one embodiment of the invention. It will be 
appreciated that the actual memory that stores the elements shown in and described 
below with reference to Figure 14 may be one or several elements, such as one or 
more disks (which may, for example be magnetic, optical, magneto-optical, etc.), 
the memory 654 and/or the mass memory 662 described above with reference to 
Figure 7. Furthermore, in one embodiment where the receiving system, with 
which the machine readable storage medium shown in Figure 14 is associated, is a 
network computer, one or more of the elements of the machine readable storage 
medium may be stored at another digital processing system and downloaded to the 
receiving system. Furthermore, the elements described with reference to the 
machine readable storage medium may, at some point in time, be stored in a non- 
volatile mass memory (e.g., a hard disk). Conversely, at other times, the elements 
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of the machine storage medium may be dispersed between different storage areas, 
such as DRAM, SRAM, disk, etc. 

Figure 14 shows a machine readable storage medium 780. In one 
embodiment, the machine readable storage medium is utilized, at least in part, to 
process media data packetized in accordance with one or more method(s) of the 
invention. The machine readable storage medium 780 may be associated with a 
receiving system, such as the client data processing system 682 described with 
reference to Figures 8 and 9, to include routines to present media data 
transmitted/received according to hints. Alternatively, the machine readable 
storage medium 780 may include media data having hints (e.g., hint tracks) 
embedded therein. Such embedded media data may be pre-packaged or generated 
by a routine stored on a machine readable storage medium, such as the machine 
readable storage medium 780. 

The machine readable storage medium 780 may include a number of 
elements. For example, the machine readable storage medium 780 includes 
software for providing operating system functionality to the receiving system, as 
depicted by a server operating system (OS) 772. A network transmission 
routine(s) 782 provides data communication functionality, such as routines, 
protocols, etc., to allow the server to transmit and receive data via a data 
communication link. 

In addition, the machine readable storage medium 780 includes a media 
presentation routine 778 for presenting media data packetized according to hints. 
Thus, the machine readable storage medium 780, and in particular, the media 
presentation routine 778, may include routines for decompression of audio and/or 
video data, displaying of video, and/or playing back audio, etc. Furthermore, the 
media presentation routine 778 typically provides handling of hints that are 
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associated with the media data. In one embodiment, the hints are simply ignored 

as media is presented. 

Optionally, the machine readable storage medium 780 may store media data 

that has been packetized according to hints as media data 774, and include a media 

data reassembly routine 776 to reassemble to the stored media data (e.g., to be 

presented, transmitted, etc.). 

Figure 15 is a diagram of a data storage and/or communication medium 

having stored/transported thereon media and hint information, according to one 

embodiment of the invention. A data storage and/or communication medium 
(medium) 800 is shown, which represents various types of transport and/or 
storage medium in which a media data packet 804 and a hint packet 806 packetized 
according to the present invention could be stored or transported. For example, 
the medium 800 may represent the mass memory 662 and/or the memory 654, 
described above with reference to Figure 7. The medium 800 may also represent a 
communication medium, such as the LAN bus 612 shown in Figure 6 or the data 
communication link 686 for transporting data/signals representing media and/or 
other information. 

The hint packet 806 and the media packet 804 may be integrated into one 
packet or be stored and/or transported separately, as depicted in Figure 15. 
Furthermore, the hint packet 806 and the media packet 804 may embody several 
types of formats, such as ones described herein or one associated with other media 
formats, network protocols, and/or digital processing device architecture. 

Provided below are some example formats of hints. It will be appreciated 
that the present invention, however, may be utilized with various types of network 
protocols, digital processing system architectures, media formats, etc., to provide 
transmission of time-based data. 
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Alternative Embodiments 

While the invention has been described in terms of several embodiments 
and illustrative figures, those skilled in the art will recognize that the invention is 
not limited to the embodiments or figures described. In particular, the invention 
can be practiced in several alternative embodiments that provide packetization of 
time related media data. 

Therefore, it should be understood that the method and apparatus of the 
invention can be practiced with modification and alteration within the spirit and 
scope of the appended claims. The description is thus to be regarded as illustrative 
instead of limiting on the invention. 

A ppendix A - Packetization Hint Sample Description 

In one embodiment of the present invention, each hint track has a table of 
sample descriptions. Hint tracks typically have one sample description. The 
format for each sample description entry for a hint track, according to one 
embodiment of the present invention, is described below in Table 1 . 



Table 1: Hint Track Sample Descripl 


tion Format 


Hint Track Sample Description 


Bytes 


Sample description size 


4 


Data format 


4 


Reserved 


6 


Data reference index 


2 


Max packet size 


4 


Additional data table 


variable 



The packetization hint header atom contains the following data elements: 
Field descriptions: 

Sample A 32-bit integer that specifies the number of bytes 
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description size 
Data format 



Reserved 
Data reference 



Max packet size 



Additional Data 
Table 



in the sample description, 

A 32-bit integer indicating the format of the hints 
stored in the sample data. Different formats may be 
defined for different hint types. The table below 
lists defined formats. 
Six bytes that are set to 0. 

A 16-bit integer that contains the index of the data 
index associated with the samples that use this 
sample description. Data references are stored in 
data reference atoms. 

A 32-bit integer indicating the maximum size of 

packets computed in this track. 

A table containing additional information needed 

on a per track basis. The values are tagged entries. 

There are no required entries. If an entry is not 

present in the table, a reasonable default may be 

used. 



The structure for the additional data table entries is shown in Table 2. 



Table 2: Additional Data Table Format 



Additional Data Table 


Bytes 


Entry length 


4 


Data type 


4 


Data 


Entry length - 8 



The additional data table entries contain the following data elements: 
Field descriptions: 
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Entry length A 32-bit integer indicating the length of the entire 

entry (includes 8 bytes for the length and type fields) 
in bytes. 

Data type A 32-bit integer indicating the meaning of the data 
in the entry. 

Data The data for this entry. The length of the data is 

indicated by the Data length field of the table. 

The following data tags may defined for several various types of data 
format types. Other tags may be created as required. 

Length Type Data Description 

9 'rely' A 1 byte integer indicating whether or not 

this track should be sent over a reliable 
transport. Values of 0 and 1 are defined. If 
this tag is not present, it is assumed to have 
the value zero, indicating that it can be sent 
over unreliable transports, such as UDP. 

The following data format types are defined. New types may be defined as 
needed. 

Description 

The packetization hints for sending media over RTP 
for the specific media type and encoding as 
described various IETF drafts of the Audio- Video 
(AVT) working group. 



Data Format 
! rtp' 
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The following data tag is utilized in one embodiment for 'rtp' data. 
Length Type Data Description 

12 Pirns' A 32-bit number indicating the RTP 

timescale. This tag is present in one 
embodiment for RTP data. 

The following data tags are optional for 'rtp 1 data. 

Length Type Data Description 

12 'tsro' A 32-bit number indicating the 

random offset to add to the stored 
time stamp when sending the RTP 
packets. If this field is not present, a 
truly random number should be 
used, per the RTP specification. 
The value of this field could be zero, 
indicating that no random offset is to 
be added. 

10 'snro* A 16-bit number indicating the 

random offset to add to the sequence 
number when sending the RTP 
packets. If this field is not present, 
truly random number should be 
used, as per the RTP specification. 
The value of this field could be zero, 
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indicating that no random offset is to 
be added. 

A ppendix B — Example hint track for RTP 

This section presents one example of a hint track format for streaming RTP 
from a QuickTime movie. 

In standard RTP, each media stream is typically sent as a separate RTP 
stream. Multiplexing is generally achieved by using IP's port-level multiplexing, 
not by interleaving the data from multiple streams into a single RTP session. 
Therefore each media track in the movie should have an associated RTP hint track. 
In one embodiment of the present invention, each hint track contains a track 
reference back to the media track which it is streaming. 

In this example, the packet size is determined at the time the hint track is 
created. Therefore, in the sample description for the hint track (a data structure 
which can contain fields specific to the 'coding' - which in this case is a protocol), 
the chosen packet size is indicated. In one example of the present invention, 
several RTP hint tracks are provided for each media track to provide different 
packet size choices. Other protocols may be parameterized as well. Similarly, the 
appropriate time-scale for the RTP clock is provided in the sample description 
below. 

The hint track is related to its base media track by a single track reference 
declaration. (RTP does not permit multiplexing of media within a single RTP 
stream). The sample description for RTP declares the maximum packet size which 
this hint track will generate. Session description (SAP/SDP) information is stored 
in user-data atoms in the track. 
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Each sample in the RTP hint track contains the instructions to send out a 
set of packets which must be emitted at a given time. The time in the hint track is 
emission time, not necessarily the media time of the associated media. 

In the following description the internal structure of samples, which are 
media data, not meta data, in the terminology of this example is described, need 
not be structured as objects. 

In this example, each sample contains two areas: the instructions to 
compose the packets, and any extra data needed when sending those packets (e.g. 

an encrypted version of the media data). 

struct RTPsample { 

int ( 16 ) packetcount ; 

RTPpacket packets [packetcount] ; 
byte [ ] extradata ; 

} 

Each RTP hint packet contains the information to send a single packet. In 
one embodiment, to separate media time from emission time, an RTP time stamp is 
specifically included, along with data needed to form the RTP header. In 
alternative embodiments, however, this is not the case. Other header information 
is typically supplied. A table of construction entries is constructed as follows: 

struct RTPpacket { 

int (32) RTPtime; 

int { 16 ) par tialRTPheader ; 

int (16) RTPsequenceseed; 

int (16) entrycount; 

dataentry constructors [entrycount] ; 

} 

There are various forms of the constructor. Each constructor is 16 bytes, 
which may make iteration relatively simple. The first byte is a union discriminator: 
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struct dataentry { 

int ( 8 ) entry type ; 
switch entry type { 

case immediate: 

int ( 8 ) bytecount ; 

int ( 8 ) bytestocopy [bytecount ] ; 

case mediasample : 

int ( 8 ) reserved [ 5 ] ; 

int (16) length; 
int ( 32 ) mediasamplenumber ; 

int (32) mediasampleof f set; 

case hintsample: 

int ( 8 ) reserved [ 5 ] ; 

int (16) length; 
int (32) hintsamplenumber; 
int (32) hintsampleof f set; 

} 

} 

The immediate mode permits the insertion of payload-specific headers (e.g. 
the RTP H.261 header). For hint tracks where the media is sent 'in the clear', the 
mediasample entry may specify the bytes to copy from the media track, by giving 
the sample number, data offset, and length to copy. For relatively complex cases 
(e.g. encryption or forward error correction), the transformed data may be placed 
into the hint samples, and then hintsample mode may be used, which would be 
provided from the extradata field in the RTPsample itself. 

In one example of the present invention, there is no requirement that 
successive packets transmit successive bytes from the media stream. For example, 
to conform with RTP-standard packing of H.261, in one example of the present 
invention, a byte may be sent at the end of one packet and also at the beginning of 
the next (when a macroblock boundary falls within a byte). 



Ap pendix C ■ Packet! nation Hin t Sample Data for Data Format 'rtp' 

This appendix provides a description of the sample data for the 'rtp' 
format, according to one embodiment of the invention. The 'rtp' format assumes 
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that a server is sending data using Real Time Transport Protocol (RTP). This 
format assumes that the server knows about RTP headers, but does not require 
that the server know anything about specific media header, including media 
headers defined in various IETF drafts. 

In one embodiment of the present invention, each sample in the hint track 
will generate one or more RTP packets. Each entry in the sample data table in a 
hint track sample corresponds to a single RTP packet. Samples in the hint track 
may or may not correspond exactly to samples in the media track. In one 
embodiment of the present invention, data in the hint track sample is byte aligned, 
but not 32-bit aligned. 

Field descriptions: 

Entry count A 16-bit unsigned integer indicating the number of 



packet entries in the table. Each entry in the table 



corresponds to a packet. Multiple entries in a single 
sample indicate that the media sample had to be split 
into multiple packets. A sample with an entry count 



of zero is reserved and if encountered, should be 



skipped. 



Packet entry table 



A variable length table containing packet entries. 



Packet entries are defined below. 



Additional data 



A variable length field containing data pointed to by 



the entries in the data table shown below by Table 3: 
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Table 3 - Additional Data 



Packet Entry 


Bytes 


Relative packet transmission time 


4 


Flags 


4 


RTP header info 


2 


RTP sequence number 


2 


Entry count 


2 


Data table 


variable 



In one embodiment, the packet entry contains the following data elements: 

Field descriptions: 

relative packet A 32-bit signed integer value, indicating the time, 

transmission time 

in hint track's timescale, to send this packet relative 
to the hint sample's actual time. Negative values 
mean that the packet will be sent earlier than real 
time, which is useful for smoothing the data rate. 
Positive values are useful for repeating packets at 
later times. Within each hint sample track, each 
packet time stamp is nondecreasing. 
flags A 32-bit field indicating certain attributes for this 

packet. 

The RTP header information field contains the following element: 
Field Bit # Description 

R 31 A 1 -bit number indicating that this is a 
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repeat packet - the data has been defined in a 
previous packet. A server may choose to skip 
repeat packets to help it catch up when it is behind 
in its transmission of packets. All repeated packets 
for a given packet care in the same hint sample. 
All undefined bits (0-30) are reserved and are set to 
zero. 

RTP header info A 16-bit integer specifying various values to be set 

in the RTP header. 

The RTP header information field contains the following elements: 
Field Bit# Description 

P 2 A 1-bit number corresponding to the 

padding (P) bit in the RTP header. This bit 
may not be set, since a server that needed 
different packet padding may generally need 
to un-pad and re-pad the packet itself. 

X 3 A 1-bit number corresponding to the 

extension (X) bit in the RTP header. This 
bit may not be set, since a server that needs 
to send its own RTP extension may either 
not be able to, or may be forced to replace 
any extensions from the hint track, 

M 8 A 1-bit number corresponding to the marker 

(M) bit in the RTP header. 

pay load 9-15 A 7-bit number corresponding to the 

type 
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payload type (PT) field of the RTP header. 

All undefined bits (0-1 and 4-7) are reserved and are set to zero. 
The location of the defined bits are in the same bit location as in the 
RTP header. 

RTP sequence A 16-bit integer specifying the RTP sequence number for 
number 

the packet. The RTP server adds a random offset to this 
sequence number before transmitting the packet. This field 
allows re-transmission of packets, e.g., the same packet 
can be assembled with the same sequence number and a 
different (later) packet transmission time. For example, a 
text sample with a duration of 5 minutes can be 
retransmitted every 10 seconds so that clients that miss the 
original sample transmission (perhaps they started playing a 
movie in the middle) will be "refreshed" after a maximum 
of 10 seconds. 

Entry count A 16-bit unsigned integer specifying the number of entries 

in the data table. 

Data table A table that defines the data to be put in the pay load portion 

of the RTP packet. This table defines various places the 
data can be retrieved, and is shown by Table 4. 
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Table 4 - Data Table 



Data table entry 


Bytes 


Data source 


1 


Data 


IS 



The data source field of the entry table indicates how the other 15 bytes of the 
entry are to be interpreted. Values of 0 through 4 are defined. The various data 
table formats are defined below. Although there are various schemes, the entries 
in the various schemes are typically 16 bytes long. 

No-Op Data Mode 

The data table entry has the following format for no-op mode: 
Field description: 

Data source = 0 A value of zero indicates that this data table entry is 

to be ignored. 

Immediate Data Mode 

The data table entry has the following format for immediate mode: 
Field description: 

Data source =1 A value of one indicates that the data is to be 

immediately taken from the bytes of data that 
follow. 

Immediate length An 8-bit integer indicating the number of bytes to 

take from the data that follows. Legal values range 
from 0 to 14. 
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Immediate data 



14 bytes of data to place into the payload portion of 
the packet Only the first number of bytes indicated 
by the immediate length field are used. 



Sample Mode 

The data table entry has the following format for sample mode: 
Field description: 

Data source -2 A value of two indicates that the data is to be taken from 
a track's sample data. 

Track ref index A value that indicates which track the sample data will 
come from. A value of zero means that there is exactly 
one media track referenced, which is to be used. 
Values from 1 to 127 are indices into the hint track 
reference atom entries, indicating from which original 
media track the sample is to be read. A value of -1 
means the hint track itself, i.e., the sample from the 
same track as the hint sample currently being parsed is 
used. 



Bytes per 

compression 

block 



Samples per 
compression 
block 



A 16-bit unsigned integer specifying the number of 
bytes that results from compressing the number of 
samples in the Samples per compression block field. A 
value of zero is equivalent to a value of 1 . 

A 16-bit unsigned integer specifying the uncompressed 
samples per compression block. A value of zero is 
equivalent to a value of 1 . 
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Length A 16-bit integer specifying the number of bytes in the 

sample to copy. 

Sample Number A 32-bit integer specifying sample number of the track. 

Offset A 32-bit integer specifying the offset from the start of 

the sample from which to start copying. If referencing 
samples in the hint track, this will generally point into 
the Additional Data area. 

If the bytes per compression block and/or the samples per compression 
block is greater than 1, than this ratio is used to translate a sample number into an 
actual byte offset. This ratio mode is typically used for compressed audio tracks in 
QuickTime movies, such that: 

CB = NS * BPCB / SPCB 

wherein, 

CB = compressed bytes 

NS = number of samples 

BPCB = bytes per compression block 

SPCB = samples per compression block 

For example, a GSM compression block is typically 160 samples packed into 33 
bytes. Therefore, BPCB = 33 and SPCB = 160. The hint sample requests 33 
bytes of data starting at the 161st media sample. Assuming that the first 
QuickTime chunk contains at least 320 samples, so after determining that this data 
will come from chunk 1, and where chunk 1 starts, the ratio is utilized to adjust the 
offset into the file where the requested samples will be found: 
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chunk_number = 1; /* calculated by walking the sample-to-chunk atom*/ 
first_sample_in_this_chunk = 1 ; /* also calculated from that atom*/ 
chunk_offset = chunk_offsets[chunk_number]; /* from the stco atom */ 
data_offset = (sample_number - first_sample_in_this_chunk) * BPP / SPP 
read_from_file(chunk_pffset + data_offset, length); /* read our data */ 



Sample Description Mode 

The data table entry has the following format for sample description mode: 
Fiftlri description: 

Data source = 3 A value of three indicates that the data is to be taken 
from the media track's sample description table. 



Track ref index A value that indicates which track the sample data will 



come from. A value of zero means that there is exactly 
one hint track reference, which is to be used. Values 
from 1 to 127 are indices into the hint track reference 
atom entries, indicating from which original media track 
the sample is to be read. A value of -1 means the hint 
track itself, i.e., the sample description from the same 
track as the hint sample currently being parsed is 
utilized. 



Reserved 



Four bytes that are set to zero. 



Length 



A 16-bit integer specifying the number of bytes in the 



sample to copy. 



Sample 

description 

index 



A 32-bit integer specifying the index into the media's 



sample description table. 
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Offset A 32-bit integer specifying the offset from the start of 

the sample from which to start copying. 

Additional data A variable length field containing data pointed to by hint 
track sample mode entries in the data table. 

A ppendix D — Exampl e hint track format for MPEG-2 Transport 

This section presents one example of a simple track format for streaming 
MPEG-2 transport from a QuickTime movie holding elementary streams. 

An MPEG-2 transport stream is associated with a multiplex of one or more 
elementary streams. For this reason, an MPEG-2 transport hint track describes 
how to construct such a multiplex from one or more media tracks. There is not 
necessarily a one to one relationship between media tracks and MPEG-2 transport 
hint tracks. Each hint track may contain references to the elementary streams it 
represents. In one example of the present invention, a QuickTime file might 
contain multiple such hint tracks to describe different multiplexes. 

Packet size is generally not an issue, since all MPEG-2 transport packets 
are 188 bytes in size. In one example of the present invention, each transport 
packet (in the MPEG-2 transport protocol) contains payload data from one media 
track. This allows for a relatively simple hint description for each transport 
packet. In one example of the present invention, each such hint describes which 
header data appears on each transport packet, and then points to the payload in the 
appropriate media track for the transport packet. For packets which do not 
correspond with a media track, such as PSI packets, the hint may describe 188 
bytes of header data, and any media track reference may be considered irrelevant. 
For packets which do correspond with a media track, the header data may account 



WO 99/37056 



PCT/US99/00953 



-57- 

for information such as transport headers, possible adaptation headers, and PES 
headers for transport packets that begin PES packets. 

Reference is made to the MPEG-2 transport hint track in the Sample 
Description Atom (of type 'stsd'). This atom includes a sample description table, 
and the entries in this table differ based on the media type. In one example of the 
present invention, hint tracks begin with the structure shown in Table 1. The 
additional data table may hold entries with the structure shown in Table 2: 

In one example of the present invention, if the hint track is an MPEG-2 
transport hint track, the data format in the hint track sample description entry will 
be i m2t t and the max packet size will always be 188. In such a description entry, 
the types shown below in Tables 5-7 may be found in the additional data table: 
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Table 5 - Additional Data Table Entries 



Entry length 


Data type 


Data description 


8 


0x00000000 


Indicates there are no more entries in the table 


9 


'otyp' 


Describes how offsets are described in the 
hints. The one byte of data has values 
described below in figure B.4. This entry is 
ntanaaiory in me uuuiiiunui uuiu wuiv. 


9 


'msns' 


Describes the size of media sample numbers. 
The one byte of data indicates how many bytes 
are used to specify media sample numbers. If 
this is not present, and media sample numbers 
are present in me sampie adia, me uei«tuit 
value is 4 bytes. 


9 


'msos' 


Describes the size of media sample offsets. 
The one byte of data indicates how many bytes 
are used to specify media sample offsets. If 
this is not present, and media sample offsets 
are present in the sample data, the default 
value is 4 bytes. 


<> 


'fosz' 


Describes the size of file otfsets. The one 
byte of data indicates how many bytes are used 
tn cnpcifv file offsets within samples If this is 
not present, and file offsets are present in the 
sample data, the default value is 4 bytes. 


Variable 


6 1 map 7 


Describes an abbreviated mapping of media 
tracks. Each 5 byte entry maps a 4 byte track 
ID to a 1 byte track reference number. This 
limits any given transport mux to containing 
no more than 256 media tracks, but this should 
not be a limiting factor, and this compression 
is useful in limiting the size of the hint track. 
The format of these 5 byte entries is specified 
below in figure B.5. This entry is mandatory 
in the additional data table. 



Table 6- 'otyp' Values In the Additional Data Table 



Value 


Description 


0 


Samples are described in terms ot media samples 


1 


Samples are described in terms ot file offsets 



Table 7 - Format of Entries in the 'tmap' Additional Data Entry 



Length 


Description 




Original Track ID 


i 


Abbreviated track reference number used in samples 
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In one example of the present invention, each hint sample describes one 
transport packet. Each transport packet can be described as some amount of 
header data, followed by some amount of payload from one media track. Since 
MPEG-2 transport packets are relatively small, a large number of hint samples 
may be generated, and thus, these samples preferably should be as small as 
possible. Several entries in the additional data table above may be used to 
minimize the size of samples, but such factors may make some of the fields in the 
sample entries variable in size. 

If the 'otyp' entry in the data table has the value 0, indicating that payload 
data is described in terms of media samples, hint samples may be of the following 
form shown in Table 8: 



Table 8 - Hint Sample Format Using Media Sample References 



Length 




1 


Track reference number or the media track holding the payload 
data for this packet. This can be mapped to a track ID using the 
'tmap' entry in the additional data table. If the hint specifies 
188 bytes of immediate data, this field is irrelevant. 


1 


The length of the immediate data for the packet. Note that this 
must be 188 or less, since transport packets are 188 bytes in 
length. 


Variable 


Bytes of immediate data to be used as the header tor the 
transport packet. The number of bytes is described by the 
previous field. 


Variable 


The media sample number to use for the payload data, ine 
default size of this field is 4 bytes, but may be modified by the 
presence of an 'msns' entry in the additional data table. 


Variable 


The media sample offset to use for the payload data. Ihe default 
size of this field is 4 bytes, but may be modified by the presence 
of an 'msos' entry in the additional data table. 



In one example of the present invention, it is not necessary to indicate the 
length of the payload data for the packet since in MPEG-2, this length is equal to 
188 minus the size of the header data for the packet. 
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If the 'otyp* entry in the data table has the value 1, indicating that pay load 
data is described in terms of file offsets, hint samples may be of the following 
form shown in Table 9: 



Table 9 



Length 


Description 


1 


Track reference number of the media track holding the payload 
data for this packet. This can be mapped to a track ID using the 
'tmap' entry in the additional data table. If the hint specifies 
188 bytes of immediate data, this field is irrelevant. 


1 


The length of the immediate data for the packet. Note that this 
must be 188 or less since transport packets are 188 bytes in 
length. 


Variable 


Bytes of immediate data to be used as the header for the 
transport packet. The number of bytes is described by the 
previous field. 


Variable 


The file offset where the payload data is located. This offset is 
in the file where the data for the media track is located. The 
default size of this field is 4 bytes, but may be modified by the 
^presence of an *fosz' entry in the additional data table. 



In one example of the present invention, hint samples may describe their 
offsets in terms of media samples or in terms of file offsets. Each of these has 
advantages and disadvantages. If hint samples specify payload in terms of media 
samples, they may be more resilient to additional editing of the file containing the 
media track, but may require additional processing for delivery. If hint samples 
specify payload in terms of file offsets, the payload data can be accessed relatively 
quickly, but any editing of the file containing the media track may invalidate the 
hints. 



A ppendix D — An example file 

Provided below is a relatively short (six frame) sample file, with some of 
the relatively less important fields and objects left out (marked here by ellipsis 
. .")» and with some fictitious numbers to illustrate the overall structure of a file 
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which is ready for streaming over RTP, according to one embodiment of the 
present invention. The media data has been left out; only the meta-data is shown. 



moov the entire movie meta-data 
iwhd -- overall movie information 

TIME- SCALE 600 

DURATION 2792 

PREFERRED-RATE 1 

VOLUME 255 

MATRIX [[10 0] [0 1 0] [0 0 1]] 

NEXT- TRACK- ID . 5 -- tracks 1 to 4 are here 

trak — this is the video track 
tkhd 

TRACK- ID 1 
DURATION 2792 
LAYER 0 

MATRIX [[10 0] [0 1 0] [0 0 1]] 

WIDTH 176 
HEIGHT 144 
mdia 
mdhd 

TIME-SCALE 600 
DURATION 2722 

hdlr — we use the basic video media handler 

TYPE mhlr 

SUBTYPE vide 

MANUFACT appl 

NAME Apple Video Media Handler 
minf 
vmhd 

hdlr — basic 'alias' disk data handler gets the data 

TYPE dhlr 

SUBTYPE alis 

MANUFACT appl 

NAME Apple Alias Data Handler 
dinf 
dref 



ENTRY- COUNT 
REFS 



1 

[Pointer to this file] 
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stbl the complete sample table 
stsd — the sample description (s) 

ENTRY-COUNT 1 

DESCRIPTIONS [video sample description] 

stts -- convert time to sample 



ENTRY-COUNT 6 
TIMETOS AMPLE ((1 200) 

(1 251) 
(1 479) 
(1 531) 
(1 1022) 
(1 239) ) 

stss -- 'sync' or key sample numbers 

ENTRY-COUNT 1 
SYNC SAMPLES (1) 
stsc — sample to chunk 

ENTRY-COUNT 1 
SAMPLETOCHUNK ((1 1 1) ) 

-- 1st chunk, samples /chunk, desc. 
stsz sample sizes 



-- count, duration 



number 



different 



DEF SAMPLES I ZE 

ENTRY-COUNT 
SAMPLESIZES 



0 — no default size, all 
6 

(664 
616 
1176 
1304 
2508 
588) 



stco — chunk offsets into file 



ENTRY- COUNT 
CHUNKOFFS ETS 



6 

(4743 
5407 
8010 
12592 
17302 
25268) 



trak — this is the sound track 
tkhd 



TRACK- ID 
DURATION 



2 

2792 



VOLUME 



mdia 
mdhd 



TIME- SCALE 



8000 



WO 99/37056 



PCT/US99/00953 



-63- 



G3M] 



DURATION 372 80 

LANGUAGE US English 

hdlr — handled by the basic sound handler 

TYPE mhlr 

SUBTYPE soun 

MANUFACT appl 

NAME Apple Sound Media Handler 
minf 
smhd 

BALANCE 0 
hdlr — data fetched by usual disc data handler 

TYPE dhlr 

SUBTYPE alis 

MANUFACT appl 

NAME Apple Alias Data Handler 
dinf 
dref 

ENTRY -COUNT 1 

REFS [Pointer to this file] 

stbl — sample table for the sound 
stsd — sample descriptions 

ENTRY-COUNT 1 

DESCRIPTIONS [Sound sample description, incl 

stts time to sample table 

— sound is measured by uncompressed samples 

ENTRY-COUNT 1 

TIMETOSAMPLE ((37280 1 ) ) 

stsc 

ENTRY- COUNT 2 
SAMPLETOCHUNK ((1 4000 1) 

(10 1280 1) ) 
— first chunk, samples /chunk, desc. number 

stsz 

DEFSAMPLES I ZE 1 — all samples same size 

ENTRY- COUNT 37280 
stco — chunk offset table 

ENTRY- COUNT 10 
CHUNKOFFSETS (3093 

3918 

6023 

9186 

10915 

13896 ...) 
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trak the RTP hints for the video track 
tkhd 

TRACK- ID 3 
DURATION 2792 

tref 

hint -- references the video track 
TRACKIDS ( 1 ) 

mdia 
mdhd 

TIME- SCALE 600 
DURATION 2792 

hdlr is % played ' by the hint media handler 

TYPE mhlr 
SUBTYPE hint 
MANUFACT appl 

NAME hint media handler 
minf 
gmhd 

hdlr — if played, the regular disc handler would fetch 

data 

TYPE dhlr 

SUBTYPE alis 

MANUFACT appl 

NAME Apple Alias Data Handler 

dinf 
dref 

ENTRY- COUNT 1 

REFS [Pointer to this file] 

stbl — samples describe packets 
stsd 

ENTRY-COUNT 1 

DESCRIPTIONS [hint sample description] 
stts -- one pacJcet per frame for video 

ENTRY-COUNT 6 

TIMETOSAMPLE ( ( 1 27 0) 

(1 251) 

(1 479) 

(1 531) 

{1 1022) 

(1 239) ) 

stss key sample derive from video 



ENTRY -COUNT 



1 
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S YNC SAMPLE S (1) 
stsc — sample to chunk table 

ENTRY-COUNT 1 
SAMPLETOCHUNK ( { 1 1 1 ) ) 

stsz sample sizes (packet instructions ) 



DEFSAMPLESIZE 

ENTRY-COUNT 

SAMPLESIZES 



0 
6 

(52 
52 
52 
52 
102 
52) 



stco -- chunk offsets 



ENTRY-COUNT 
CHUNKOFFSETS 



6 

(6848 
6900 
10011 
14721 
20635 
25856) 

udta __ track is named for ease of idientification 
name 

NAME Hinted Video Track 

trak -- the RTP hints for the sound track 
tkhd 



TRACK- ID 



4 



tref -- references the sound track 
hint 



TRACKIDS 

mdia 
mdhd 

TIME- SCALE 
DURATION 



(2) 



8000 
37120 



hdlr 



TYPE 

SUBTYPE 

MANUFACT 



NAME 
minf 
gmhd 



mhlr 
hint 
appl 

hint media handler 



hdlr 



TYPE 
SUBTYPE 



dhlr 
alis 
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MANUFACT 

NAME 
dinf 
dref 

ENTRY-COUNT 
REFS 

stbl 
stsd 



appl 

Apple Alias Data Handler 
1 

[Pointer to this file] 



ENTRY- COUNT 
DESCRIPTIONS 
stts -- time to sample 

ENTRY- COUNT 
TIMETOSAMPLE 



stsc 

ENTRY- COUNT 
SAMPLETOCHUNK 
stsz 

DEF SAMPLES I Z E 
ENTRY - COUNT 
SAMPLESIZES 



stco 



ENTRY-COUNT 
CHUNKOFFSETS 



udta 



NAME 



[hint sample description] 



( (1 960) 

(7 4000) 

(1 1120) 

(1 7040) ) 



1 

{(111)) 



0 

10 

(206 
852 
852 
852 
852 
852 



10 

(6952 
7158 
10063 
11740 
14773 

16450 . . . ) 
Hinted Sound Track 
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CLAIMS 

What is claimed is: 

1 . A machine readable medium having stored therein: 

a set of data comprising at least one instruction to cause a digital processing 
system to generate a set of packets representing a time related 
sequence of media data, wherein said set of packets is associated 
with a transmission protocol, and wherein said set of data 
comprises a time related sequence of data associated with said time 
related sequence of media data. 

2 . The machine readable medium of claim 1, wherein said set of data is 
associated with a track of indicating data, and wherein said transmission protocol 
comprises a packet data protocol. 

3 . The machine readable medium of claim 1 , wherein said set of data is 
further associated with a QuickTime file. 

4 . The machine readable medium of claim 1 , wherein said set of data is a time 
related sequence of data associated with and separate from of at least one of a 
sequence of audio data and video data of said time related sequence of media data. 

5 . The machine readable medium of claim 1, wherein said set of data indicates 
a timing parameter for transmitting said set of packets over a data communication 
medium. 
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6. The machine readable medium of claim 1, wherein said set of data includes 
instructions to provide multiplexing of information into said set of packets for 
transmission over a data communication medium. 

7 . The machine readable medium of claim 1, wherein said transmission 
protocol comprises an Internet protocol. 

8 . A machine readable medium having stored thereon: 

information to cause a digital processing system to transmit a first time 
related sequence associated with media data according to a 
transmission protocol, and wherein said information comprises a 
second time related sequence associated with and separate from 
said first time related sequence. 

9 The machine readable storage medium of claim 8, wherein said information 
is associated with a track of indicating data, and wherein said transmission 
protocol comprises a packet data protocol. 

10. The machine readable medium of claim 8, wherein said information is 
associated with a QuickTime file. 

1 1 . The machine readable medium of claim 8, wherein said information 
indicates a timing parameter for transmitting said first time related sequence as a set 
of packets to be transmitted over a data communication medium. 
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1 2 . The machine readable medium of claim 8, wherein said information 
includes instructions to provide multiplexing of said first time related sequence into 
a set of packets to be transmitted over a data communication medium. 

1 3 . The machine readable medium of claim 8, wherein said transmission 
protocol comprises an Internet protocol. 

14. The machine readable medium of claim 8, comprising a magnetic storage 
medium. 

1 5 . The machine readable medium of claim 8, comprising an electronic storage 
medium. 

16. The machine readable medium of claim 8, comprising an optical storage 
medium. 

17. A computer readable medium containing media data and information which 
indicates how to transmit said media data from a digital processing system, said 
computer readable medium comprising: 

a first time related sequence of data; 

a second time related sequence of data which indicates how to transmit said 
first time related sequence of media data in a data communication 
medium, wherein said second time related sequence of data is 
associated with said first time related sequence of data. 
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1 8 . The computer readable medium of claim 17, wherein said second time 
related sequence of data is associated with a track of indicating data, and wherein 
said transmission protocol comprises a packet data protocol. 

1 9 . The computer readable medium of claim 17, wherein said second time 
related sequence of data is associated with a QuickTime file. 

20. The computer readable medium of claim 17, wherein said second time 
related sequence of data indicates a timing parameter for transmitting said time 
related sequence of media data as a set of packets to be transmitted over a data 
communication medium. 

2 1 . The computer readable medium of claim 17, wherein said set of data 
includes instructions to provide multiplexing of said time related sequence into a 
set of packets to be transmitted over a data communication medium. 

22. The computer readable medium of claim 17, wherein said transmission 
protocol comprises an Internet protocol. 

23 . The computer readable medium of claim 17, comprising a magnetic storage 
medium. 

24. The computer readable medium of claim 17, comprising an electronic 
storage medium. 
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25. The computer readable medium of claim 17, comprising an optical storage 
medium. 

26. A data signal embodied in a carrier wave comprising: 

a time related sequence of media data associated with a set of data to 

indicate a method to transmit said time related sequence of media 
data according to a transmission protocol, wherein said set of data 
is a time related sequence of data associated with and separate from 
said sequence of media data. 

27. The data signal of claim 26, wherein said time related sequence of media 
data is further associated with data packets associated with said transmission 
protocol, which comprises a data packet protocol. 

28. The data signal of claim 26, wherein said set of data indicates a relative 
bandwidth associated with transmission of said time related sequence of media 
data. 
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